Gene flow among populations in mainland China is very common; thus, it is difficult to find a population segregated from gene flow (that is, a population that maintains its simple genetic background). In molecular epidemiological studies, genetic samples from populations with simpler genetic backgrounds are more useful than those of populations with complex genetic diversity. Epidemiologists therefore constantly seek out isolated populations. About 10 years ago, many researchers began to take note of a special population, the Hei-Yi Zhuang (Black Clothes Zhuang).1 This population resides in the mountain area of the China–Vietnam borderland, and it has maintained most of the Zhuang ethnic group's cultural traditions though other Zhuang populations have not. This group also claims to be strictly endogamous, which could result in genetic isolation. Therefore, many epidemiological studies have been conducted in this group.2 However, the isolation of this population has never been genetically proven.

The Hei-Yi Zhuang, alternatively called ‘Minz’,3 resides in Napo County, located in southwest Guangxi Zhuang Autonomous Region. The neighboring Pinghua Han Chinese, Mien, Kimmun, Hmong, Lolo, and Yerong3 can also be found in different areas of Napo County. To detect possible gene flow between the Minz Zhuang and Han (Gaoshan Pinghua) in Napo (Supplementary Figure 1), we studied Y chromosome and mitochondrial DNA (mtDNA) polymorphisms from these two populations. Samples from 130 Minz (63 males and 67 females) and 205 Han (82 males and 123 females) were used in this study. Our study was approved by the Fudan School of Life Sciences Ethics Committee. All subjects signed informed consents. We employed the same methods used in our previous Pinghua population study.4 Thirty single-nucleotide polymorphisms and 14 short tandem repeats (STRs) on the Y chromosome were typed (Supplementary Table 1). The hypervariable segments of mtDNA were sequenced, and 21 coding region single-nucleotide polymorphisms were typed using a SNaPshot assay (Applied Biosystems, Foster City, CA, USA; Supplementary Table 2). All sequences have been submitted to the GenBank (accession numbers: GU108620-947). The haplogroups were determined according to the most updated nomenclature for Y chromosome5 and mtDNA.6

The Y chromosomes from the Minz samples were clustered into 12 haplogroups (Table 1). Four of these haplogroups, O1a*, O2a*, C3* and D1, have high frequencies in the Minz samples. O1a* and O2a* are dominant in the Daic ethnic group, which includes the Zhuang,7 suggesting that the Minz are a typical Daic population. C3* is distributed widely across East Asia with a low frequency in most populations except for North Asian Altaic populations.8 D1 is common in Tibet and neighboring areas, but is very rare in Southeast Asia.9 The moderate frequencies of these two haplogroups among the Minz may result from the genetic drift of certain ancestral contributors to the Minz. The Han sample from Napo only formed eight haplogroups (Table 1), and all of these haplogroups are shared with the Minz. The haplogroup specific to Southeast Asia, O2a*, reaches its highest frequency among the Napo Han. O3a3c1 is the second highest occurrence in this population. A previous study proved that the Han Chinese originated in North China, and the Y chromosomes of Han contain mostly haplogroup O3.10 The Y-chromosome diversity of Napo Han indicated that this population has common Han Chinese characteristics but was also strongly influenced by indigenous populations. The genetic affinity between Napo Han and the indigenous Daic populations were further proven by principal component analysis using the SPSS 15.0 program. The principal component plot (Supplementary Figure 2a) distinguished a Han cluster (upper left quadrant) and a Daic cluster (lower right quadrant). The Minz and Napo Han are both included in the Daic cluster, showing that the major paternal proportions of these two population samples are indigenous groups from the south.

Table 1 Y-chromosome and mitochondrial DNA (mtDNA) haplogroup frequencies among Minz Zhuang and Napo Han populations

On the maternal side, mtDNA haplogroups for the Minz are mainly D4, R9c, M*, M7b* and B4a, whereas those for the Napo Han are mainly M7b*, M*, M7* and R9b. These haplogroups are predominantly derived from southern China11 and are not Han dominant (for example, A, C, D, G, M8a, Y and Z10). The principal component plot (Supplementary Figure 2b) also illustrates that the Minz and Napo Han are in the southern China cluster rather than the Han Chinese cluster. Notably, the Minz and Napo Han are very close to each other in both plots, indicating possible gene flow between the two groups.

To further investigate gene flow, we applied network analyses (NETWORK 4.516) to both Y-chromosome STR haplotypes and mtDNA hypervariable segment motifs within the major haplogroups of the two populations (Figure 1). The Y haplogroup D1 in the Minz population only formed one STR haplotype. This haplotype is shared with the Hmong from Laos and is linked upstream to Sino-Tibetan populations, indicating that D1 lineages in the Minz might have undergone genetic drift through a circuitous route out of Sino-Tibetan populations. Y haplogroup C3* in the Minz formed five STR haplotypes suggesting an absence of genetic drift similar to that of D1. Other Y and mtDNA haplogroups show higher diversity in Napo populations. Interestingly, a Napo-specific clade was found in the mtDNA haplogroup M7b. The hypervariable segment motif for this clade is 16129-16192-16223-16297. This clade includes five haplotypes and forms a sun-shaped structure. All five haplotypes can be found among Napo Han, but only the central haplotype can be found among the Minz. This suggests gene flow from the Napo Han to the Minz. We counted the unique shared and connected haplotypes among populations in these networks to assess gene flow (Table 2). The result showed that no paternal gene flow can be detected between the Minz and Napo Han, but maternal gene flow is pronounced. Gene flow from the other Daic or Han populations to the Minz and Napo Han was also observed.

Figure 1
figure 1

Networks of the major haplogroups in Napo Han and Zhuang. (a) Y chromosome, (b) mitochondrial DNA.

Table 2 Unique shared and upstream connected haplotypes counted from the networks

Genetic diversities of both populations were calculated. The Y-STR diversity of the Minz is lower than that of the Napo Han, but mtDNA diversities are almost the same (Supplementary Table 3), indicating that the gene flow in maternal lineages is stronger than that in paternal lineages.

Thus, we show that gene flow between the Han and Zhuang populations in Napo is pronounced, especially in maternal lineages. Cultural difference is not a barrier to gene flow. The Minz Zhuang is not a genetically isolated population. Their lineages may be from Daic populations, as well as from Sino-Tibetan and Hmong-Mien populations. Although the borderland between China and Vietnam is moderately segregated geographically, populations arrived in this area from various regions. Therefore, the Minz might not be a truly isolated population for epidemiological studies. It might be better to examine autosomal DNA polymorphisms to further assess the population admixture. However, genetic studies on the Minz are still very relevant, as the Minz can be viewed as a half-way point population in the migration of the Thai-Lao population from China.