Main

With the arising of agricultural revolution, the evolution of food producing has transformed human global demographics within the Holocene.1 The bio-cultural adaptations to new diets, particularly those associated with the farming and animal husbandry, have left several signatures in human genomes.2, 3 One of the most famous cases, to digest lactose from milk, human needs continued production of lactase throughout adult life (lactase persistence, LP; OMIM #223100). This trait is caused by genetic differences cis-acting to the lactase gene (LCT).4 Four causative single-nucleotide polymorphisms (SNPs) 13.9 kb upstream from LCT: −13910C/T (rs4988235), −13907C/G (rs41525747), −13915T/G (rs41380347) and −14010G/C, have been subsequently identified as the candidate cis-acting elements based on genotype–phenotype association analyses and functional experiments during the past decade.5 The derived allele of −13910*T is associated with LP in European6 and some Central Asian7 populations. This allele is identified in Indian populations and is inferred to make substantial contribution to LP in India.8 The rest three alleles, −13907*G, −13915*G and −14010*C are responsible for the LP in some African and Middle Eastern populations.9, 10 These results suggest that LP alleles have emerged independently in several geographic/ethnic groups,2 which was most likely driven by recent positive selection,10, 11 as well as demographic factors (for example, migration12).

In the Tibetan Plateau, milk and milk products (for example, from domestic yaks) are important ingredients of the daily diets for Tibetans, especially for the herders.13 The lactose tolerance test with measurement of breath hydrogen revealed that the LP prevalence in Tibetans (9/30; 30%) were significantly higher than in Han Chinese (1/30; 3.3%).14 This result was in accordance with the history of milk consumption for Tibetans,14 whereas the lack of milk from traditional Han Chinese diets.15 However, how the culture of milk usage shaped the genetic diversity of Tibetan populations remains an enigma. To investigate genetic variants for LP in Tibetans, we sequenced a region of 321 bp (position −14044 to −13724 upstream LCT) covering the previously identified four SNPs (−13907C/G, −13910C/T, −13915T/G and −14010G/C) from DNA samples of 495 Tibetan individuals living in Tibet (Figure 1). The PCR and sequencing protocols were previously reported.9 All sequences have been deposited in the GenBank (accession numbers: JQ395072–JQ395566). Meanwhile, allele −22018*A16 but not allele −13910*T17 was suggested to be associated with LP in populations from Northern China. So we further screened −22018G/A (rs182549) in our samples using the RFLP method as described before.16

Figure 1
figure 1

Population frequencies for −13910*T and −22018*A alleles associated with LP surrounding Tibet. The information of frequencies is retrieved from published literatures.7, 8, 16, 17 The sampling in Tibet is also shown in the map.

For the four SNPs (−13907C/G, −13910C/T, −13915T/G and −14010G/C), the alleles responsible for LP are completely absent in Tibetan samples (Table 1 and Supplementary Table S1). For −22018G/A, only one heterozygous form is present in the population from Nagqu. The results indicate that the LP in Tibetans may not be explained by these known SNPs. When looking at the published populations with LP around Tibet (Figure 1), −13910*T was responsible for the substantial proportion of LP in Central Asian7 and Indian8 populations; −22018*A was likely associated with the LP in populations from Northern China.16 In terms of previous studies about mitochondrial DNA and Y chromosome variation in Tibetans, some but rare gene flows from Central Asians and South Asians were detected.18, 19 The Neolithic genetic components from Northern China were revealed to contribute substantially to the current gene pool of Tibetans.18, 19, 20 Nevertheless, the susceptible alleles for LP in those surrounding populations were not introgressed into Tibetan populations via the prehistorical and historical migrations. Taken together, we propose that the LP in Tibetans is likely to have an independent origin.

Table 1 Allele of variants in the LCT enhancer in Tibetan populations

In this study, we have identified three more SNPs: −13838G/A, −13906T/A and −13908C/T in the sequenced region (Table 1 and Supplementary Table S1). Both −13906*A (0.6%; 6/990) and −13908*T alleles (0.1%; 1/990) occurred at low frequencies. The sporadic distribution patterns suggest that the two rare mutations are unlikely to be associated with LP in Tibetan populations, although the possibility that the LP due to multiple low frequency mutations cannot be excluded completely. For −13838G/A, with the exception of absence in Baqên and Nyima (Table 1), its frequency ranges from 1.9 (1/54, Rinbung) to 20.8% (5/24, Baingoin) in Tibetan populations; and it has an overall frequency of 6.6% (65/990). No significant deviations from Hardy–Weinberg equilibrium for −13838G/A were observed in the whole Tibetan population as well as the two geographically defined sub-populations (Table 2). Interestingly, the site of −13838 is located in the binding motif (position −13854 to −13830 upstream LCT) for HNF4α—a transcription factor for intestinal gene expression.21 The previous functional study has revealed that this motif is important for enhancer activity of −13910*T and co-expression of HNF4α can increase both −13910*T and −13910*C enhancer activities.21 The above evidence imply that more attention should be paid for SNP −13838G/A. To discern whether the −13838G/A and some unidentified variants are responsible for LP in Tibetans or not, further comprehensive genotype–phenotype association analyses and functional experiments are required in the future.

Table 2 The χ2 Hardy–Weinberg equilibrium test for −13838G/A