Introduction

The Liqian people live in Yongchang County (2,000 m above sea level) in the Gansu Province, China. A small proportion of the Liqian people have been described as having mixed racial morphological traits. In recent years, the Liqian population has become well known because the people were regarded as descendents of ancient Roman legionnaires, who disappeared from European history after the battle of Carrhae in southeastern Turkey in 53 B.C.E. (Dubs 1955). Homer Hasenphlug Dubs speculated that these survivors made their way east to Uzbekistan to enlist with Jzh Jzh (Dubs 1957). After the Jzh Jzh war in 36 B.C.E., more than 100 Jzh Jzh soldiers, described as Raman soldiers, settled in the present Yongchang County of Gansu province in China (Harris 1992; Song et al. 2005). To verify this hypothesis, more and more scholars have focused on studying the origin of the Liqians (Harris 1992; Dubs 1957; Song et al. 2005). Due to the lack of historical and archeological evidence, this hypothesis, based on speculation and imagination, remains controversial. Without any other direct evidence, genetic contribution seems particularly necessary. Single nucleotide polymorphisms (SNP) and short tandem repeat (STR) loci on the non-recombining region of the human Y chromosome (NRY) have been widely used to trace the origin and migratory events of modern paternal genetic lineages (Jobling and Tyler-Smith 2003). Therefore, Y-chromosome polymorphisms in our study were used to investigate the paternal genetic landscape of the Liqians and to provide genetic evidence for a suggested origin of the Liqian people.

Materials and methods

Sample collection and DNA extraction

Blood samples were obtained from 227 unrelated male individuals representing four populations from northwest China: A total of 39 Tibetans from Guide County of Qinghai Province, 49 Uygurs from Urumqi city of Xinjiang Uygur Autonomous, 87 Liqian individuals from Yongchang County of Gansu province, and 52 Yugurs from Su’nan Yugur Autonomous County of Gansu province (Fig. 1). DNA was extracted from fresh blood by use of standard phenol–chloroform protocols.

Fig. 1
figure 1

Geographic distribution of Chinese samples mentioned in the present study (redrawn from an image from 2006 Earthsat, Google Earth software)

Typing of Y chromosome DNA polymorphisms

We typed 12 Y-SNPs by size detection, restriction fragment length polymorphism (RFLP) and denaturing high performance liquid chromatography (DHPLC). There were three ins/del polymorphisms detected by agarose electrophoresis, YAP (Hammer 1994), 12f2 (Rosser et al. 2000) and MSY2 (Bao et al. 2000). Some of the markers were typed by RFLP, including SRY10831 (Whitfield et al. 1995), M9, M17 (Underhill et al. 1997), Tat (Zerjal et al. 1997) and RPS4Y (Bergen et al. 1999), and the others were genotyped by denaturing high performance liquid chromatography (DHPLC), including M122, M175, M45 and M89 (Underhill et al. 2001b).

The 12 Y-chromosomal short tandem repeats (Y-STRs) are included in the Promega Corporation PowerPlex® Y System (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a, DYS385b, DYS437, DYS438 and DYS439) and were typed in 227 individuals of our study. Allele scoring for each marker was obtained by comparison to the PowerPlex® Y System allelic ladder (Promega Corporation, Madison, WI). An Applied Biosystems 3100 automated sequencer in combination with GenescanTM 3.1 analysis software was used to detect and size the amplified fragments.

Data analysis

Arlequin2.0 software (Schneider et al. 2000) was used to compute Rst distances and associated P values based on 10,000 permutations (Slatkin 1995), to test the correlation between genetic and geographic distances (distances based on the geographical center of each region) (Mantel 1967) and to measure haplogroup/haplotype diversity (Nei 1987). Principal component analysis based on Y-SNP haplogroup frequencies and multidimensional scaling analysis based on Rst matrix were carried out using the SPSS12.0. Neighbor-joining tree on the basis of microsatellite allele frequencies was obtained using programs in the Phylip (v. 3.5) package (Felsenstein 2001). Program Adimix2.0 (Dupanloup and Bertorelle 2001) was used to calculate the admixture proportions based on Y-STR data. Median-joining network analysis (Bandelt et al. 1999) of haplogroup-associated haplotypes was performed by using NETWORK4.0.

Results

Haplogroup/ haplotype distributions and associated diversity

All 12 binary markers were found to be polymorphic, and these defined 11 haplogroups (Fig. 2). The diversity of the Yugur, the Han Chinese and the Uygur populations were high; in contrast, the Liqians presented low genetic diversity with a single highest frequency haplogroup O3-M122 (71.3%) (Fig. 2). The haplogroup O3 showed moderate frequency in the Yugurs, the Inner Mongolian and the Han Chinese populations (Xue et al. 2006) and low frequency in the Tibetans and the Uygurs. In our study, haplogroup DE defined by YAP occurred at low haplogroup frequency in Liqians, and it was also prevalent in the Yugurs, but amounted to 48.7% in the Tibetans. The lineage characterized by RPS4Y is predominant in the Inner Mongolians (Xue et al. 2006), with moderate frequency in Han Chinese and Yugur populations, but it is rare in Liqian and other Chinese populations (Xue et al. 2006). R1a1*-M17, prevalent in the Middle East and Europe (Luis et al. 2004), amounts to 28.6% in the Uygurs, while it was rare in most Chinese populations, including the Liqian and Yugur people. Obviously, the lineages (I, J and H), frequent in West Eurasia (Semino et al. 2000), are lacking in Liqian people, whereas the haplogroup O, prevalent in Liqian and other East Asian populations, is lacking in Central Asia and West Eurasia.

Fig. 2
figure 2

Haplogroup frequencies (%) and genetic diversity (D ± SE) of populations analyzed in our study. Haplogroups were numbered according to Jobling and Tyler-Smith (2003) and Y Chromosome Consortium (YCC) (2002). F* is F*(xJ, K), K* is K*(xN3, O, P), O* is O*(xO1, O3), and P* represent P*(xR1a1)

Further analysis performed on Y-STR outlined a more detailed view of Liqian Y chromosome variation. The 227 Y chromosomes representing the four populations were typed with 12 Y-specific microsatellites. Two hundred nine complete different haplotypes were identified, among which 75 haplotypes belong to Liqian people. Haplotype diversities were higher than 0.98.

Worldwide data for the 12-element haplotype are unavailable in the Y-STR Haplotype Reference Database (YHRD) (Roewer et al. 2001), but a subset of 9 of the STRs is included. A comparison on the basis of the minimal haplotype yielded 16 matches, most of which (12/16) belong to East Asia and South Asia. For the 4 of 16 matches, 2 are found only in African populations, and the other 2 are shared by European and Asian populations. All four Liqian Y chromosomes belong to haplogroup O3-M122. No West Eurasian-specific matches were observed in Liqian Y chromosomes.

Principal component analysis

For comparative analysis, principal component analysis of Y haplogroup frequencies included published data from Chinese, Outer Mongolian, Central Asian and West Eurasian populations (Fig. 3). The PC plot based on haplogroups shared by all populations reveals that the Liqians are close to most populations in China and clearly distinguished from the Central Asian and West Eurasian populations. Particularly, the Liqians are closely related to the Yugur, Han Chinese and Mongolian populations.

Fig. 3
figure 3

Principal component analysis based on Y-SNP haplogroup frequencies in China and Outer Mongolia (Xue et al. 2006), Central Asia (Wells et al. 2001) and West Eurasia (Wells et al. 2001; Semino et al. 2000). The proportion of the variation explained by PC1 is 61.15%, which explained by PC2 is 17.83%. Squares represent Central Asian, West Eurasian and Outer Mongolian populations, and triangles represent Chinese populations

Rst values and MDS analysis

Microsatellite allele frequency data were used to calculate pairwise Rst genetic distances and to perform MDS analysis. Due to the lack of original STR data of populations from Central Asia, pairwise Rst values based on nine STRs (minimal haplotype) and subsequent MDS analysis were restricted to China, West Eurasia and Outer Mongolia (Fig. 4). Most Y-STR data taken from the literature were collected by using high-resolution genotyping technologies, which made further genetic analysis more reliable. The Liqian population shows non-significant Rst values compared to the Yugur and Han Chinese populations mentioned in our study and shows significant Rst values to all other populations (data not shown). Fig. 4 portrays the results of MDS. When all populations were included, these populations lying around the outside, with extreme pairwise Rst values, cause a compression of all the remaining populations in the MDS plot. Thus, most populations cluster in the center. To extend this observation, we reanalyzed the central group of populations, which were not marked in Fig. 4a. A good fit between the two-dimensional plot and the source data (pairwise values of Rst) was obtained in Fig. 4b, demonstrated by the low stress value (0.037). The Liqian and the Han Chinese populations were placed in the lower left part of the plot, somewhat distinct from the Mongolians and Yugurs, and quite separate from the West Eurasian populations.

Fig. 4
figure 4

Two-dimensional MDS plot based on pairwise Rst values in populations throughout Eurasia and b populations lying in the center and not marked in Fig. 4a. Populations from West Asia [Arab (Immel et al. 2004), Anatolian (Çakır et al. 2004)], Europe [Albanian (Robino et al. 2002), North Italian (Cerri et al. 2005), Croatian (Lovrečic et al. 2005), Sicilian (Robino et al. 2005), Greek (Robino et al. 2004), Central Italy (Ricci et al. 2001), Gypsy and Macedonian (Peričić et al. 2005) and China (Han(He’nan) (Feng et al. 2005), Han (Liaoning) (Wang et al. 2006), Han (Ningxia)(Zhu et al. 2006c), Mongolians (Zhu et al. 2005a), Tibetans (Zhu et al. 2006b), Uygurs(Yili) (Zhu et al. 2005b), Tibetans (Qinghai), Uygurs(Urumqi), Yugurs and Liqians (present study)]

Mantel test

Microsatellite loci with high mutation rates are more suitable to study recent human evolution than binary markers (Karafet et al. 2001). We therefore based the subsequent analysis on Y-STR data. To test the correlation between genetic and geographic distances, the Mantel test was performed for Liqian, Mongolian, Yugur and Han Chinese populations, which are closely related to the Liqian people in MDS plots (Fig. 4). The Mantel test is based on pairwise Rst values computed by using 11 Y-STRs because the DYS 437 was not analyzed in Mongolian and Han Chinese populations in Liaoning and Ningxia. The result showed a significantly positive correlation between pairwise genetic distances (Rst matrix) and pairwise geographic distances (r = 0.646, P = 0.003) (Table 1), indicating that genetic distances are linearly related to geographical distances, as observed in northern groups of East Asia (Karafet et al. 2001).

Table 1 The pairwise Rst values (below diagonal) (significant level = 0.05) and pairwise geographic distances (km) (above diagonal)

Phylogenetic analysis

In MDS analysis, the Liqian population is closely related genetically to the Han Chinese, Inner Mongolian and Yugur populations. In order to better understand their relationship, we constructed a neighbor-joining phylogenetic tree. Tunisian (Frigi et al. 2006) was considered as an outgroup. Thus, the phylogenetic analysis included the Han Chinese, Yugur, Tunisian and Inner Mongolian populations in Fig. 4. Therefore, phylogenetic analysis was performed on only 11 STR markers (not include DYS437) shared by the 7 populations in Fig. 5. The relationships among these populations were illustrated in the tree (Fig. 5). The Liqian and Han Chinese populations in He’nan and Ningxia are scattered on one branch. The Mongolians, Han Chinese in Liaoning and Yugur formed a separate branch, respectively. In all, the Liqian people are closely related to adjacent Han Chinese populations, and slightly distinguished from the Yugur in Fig. 5.

Fig. 5
figure 5

Neighbor-joining tree based on 11 Y-STR haplotype frequencies. Internal numbers are bootstrap values (%) for 1,000 replicates. Tunisian was regarded as outgroup

Admixture analysis

PC and MDS analysis suggests that the Liqians are close to Chinese populations. If we assume that Liqians are a genetic admixture of some parental populations, then the Han Chinese and Mongolian populations should be representatives of these populations. Although the Yugurs are genetically close to the Liqian, they were excluded in the admixture analysis because the Yugur people were considered a kindred population (Song et al. 2005), and the Yugur people are younger than the Liqian population (Song et al. 2005; Hao 1999). The mY estimate is based on a coalescent approach and is suitable for analyzing microsatellite data (Bertorelle and Excoffier 1998). Therefore, for the same reasons as for the Mantel test, the present admixture analysis is based only on the data of 11 Y-STRs. Our results suggest that a large majority of Liqian Y chromosomes has a Han Chinese origin and that a small Mongolian contribution was observed in the Liqians (Table 2). Simultaneously, by using the same program, the admixture estimates were performed on the Yugur people in order to test the relationship of the Liqian and Yugur people. The result indicated that the Liqian people and the Yugurs have similar proportional contributions from the Han Chinese and Mongols, which is compatible with the historical hypothesis that the Yugurs and Liqians are kindred populations with common origins (Song et al. 2005).

Table 2 Admixture estimates for Liqians and Yugurs

Median-joining network

Principal component and admixture analyses suggest that the Liqians are closely related to the Yugurs; in contrast, an underlying difference between the Liqian and Yugur people has been observed in phylogenetic analysis (Fig. 5). To further understand the relationship of the Liqian and Yugur populations, a median-joining network of haplotypes associated with haplogroup O was constructed (Fig. 6). The network grouped these Y chromosomes into two clusters. In the network, Liqian chromosomes form two clusters, one almost distinct and the other shared with that involving most Yugur Y chromosomes. It is interesting to note that the Yugur cluster is surrounded by Liqian Y chromosomes. In general, mutation, recombination and migration can increase variation (Zerjal et al. 2003). According to this, the Liqian people may have a more complex demographic history than the Yugur people. The feature explains the close genetic affinity accompanied by an underlying genetic difference between Liqian and Yugur populations and also reflects that the haplotype diversity of Liqians is higher than that of the Yugurs within haplogroup O. Overall, the Liqian and the Yugur haplotype associated with haplogroup O are completely intermingled.

Fig. 6
figure 6

Median-joining network of haplotypes based on 12 Y-STRs associated with haplogroup O of Liqian and Yugur chromosomes. Circles have an area proportional to haplotype frequency. The smallest circle of the network corresponds to a single Y chromosome. Liqian Y-chromosomes are represented in gray, Yugurs in black

Discussion

The first extensive analysis of Y variation in the Liqians was carried out in our study. This allowed us to compare our data with those previously reported in worldwide populations in order to investigate the origin and evolution of the Liqian people. The results based on Y-SNPs are consistent with those obtained with Y-STRs, which reinforce our findings.

In the present study, it was unexpected that 71.3% Y chromosomes of the Liqians belonged to the haplogroup O3-M122, which is an East Asian-specific haplogroup (Su et al. 1999; Shi et al. 2005). The O3-M122 frequency in the Liqians was by far the highest one observed among populations in northern China (Xue et al. 2006; Su et al. 1999, 2000b; Shi et al. 2005). Since the Han dynasty, the Liqians have been living in northwestern China, which has been well-established by historical records (Song et al. 2005). Combining with the historical records, the haplogroup distribution suggested that most of the Liqian Y chromosomes could be traced back to northern China. The result was also reflected in PC and MDS analysis, which indicated that the Liqians were genetically close to Chinese populations, especially Han Chinese populations in various regions. The result was confirmed in admixture and phylogenetic analysis (Table 2, Fig. 5), suggesting a strong Han Chinese paternal influence on the Liqian gene pool. The Mantel test suggests that the Liqian people, wherever they originated, must have had an extensive gene exchange with the local people. Besides the Han Chinese people, Mongolians and Yugurs in China are genetically related to the Liqian people.

Previous studies demonstrated that the male lineages of the Mongolians spread rapidly in a large part of Asia (Zerjal et al. 2002, 2003). The Hexi region in northern China, where the Liqians have settled, was controlled by the Mongols in the 13th and 14th centuries. It is reasonable that the Mongolians would have an impact on the Liqian gene pool. In our study, a small Mongolian contribution was observed in the Liqian gene pool.

The Yugurs, described as a kindred population of the Liqians, were genetically close to the Liqians in PC and MDS plots, which is compatible with the admixture analysis. The paternal genetic contribution indicated that the Liqian and the Yugur populations have similar contribution proportions from the Han Chinese and Mongolian populations. Nevertheless, the underlying genetic difference between the Liqian and Yugur populations was explored in a median-joining network and neighbor-joining tree (Fig. 5, Fig. 6). It may result from long-term isolation by distance. In addition, the Yugurs are strictly endogamous and live in autonomous regions of the central province of Gansu, while the Liqians live together with Han Chinese. Overall, the genetic difference between the Liqian and Yugur populations is statistically non-significant.

In PC and MDS analysis, the Liqian population is fairly distinguished from Central Asian and West Eurasian populations. The result is incompatible with the historical hypothesis that the Liqian people derived from ancient Roman soldiers, which probably included mercenaries from West Asia, as described by Huang et al. 1990. When Liqian haplotypes were compared with worldwide populations in the YHRD, no matches were found only in the West Eurasian populations, and only two matches were shared by European populations and East Asian populations. The two Liqian haplotypes are present in West European populations, but they belong to East Asian-specific haplogroup O3. The incompatible fact probably arose from the high mutation rate of Y-STR. It seemed that the two Y chromosomes are more likely to be Asian lineages than European lineages. Failure to find an apparent link between the Liqian people and ancient Roman soldiers in this study might be either because long distance migration and intermarriage have erased earlier genetic signatures or because the Liqians are just a general population in north China.

Moreover, it is noteworthy that a small proportion of Liqian people with mixed racial traits are not necessarily associated with ancient Roman soldiers. Along the ancient Silk Road in north China, it is common to see people with Caucasian morphological traits, which is also a classical trait of Chinese minority ethnic groups in Xinjiang (like the Uygurs). Therefore, we cannot trace a Liqian origin only from morphological traits.

As described above, the Liqians are closely related to Chinese populations, especially the Han Chinese in north China. In addition, the Liqian and Yugur populations are likely to be kindred populations. No obvious signature of Roman soldier origin is observed in the Liqian paternal gene pool. A Roman mercenary origin for the Liqian people is likely to be nothing more than an interesting theory. In order to reveal genetic landscapes of the Liqians completely, complementary autosomal and mtDNA studies have to be carried out in future work.