Introduction

Intestinal lactase phlorizin hydrolase is the brush border disaccharidase responsible for the digestion of dietary lactose. In the majority of the world’s population lactase activity does not persist into adult life but declines after weaning. In some people, however, particularly in northern Europe, lactase activity persists into adulthood. The frequency of these two phenotypes varies in different populations of the world. Lactose tolerance tests in families have suggested that the polymorphism is controlled by two alleles at a single gene locus with persistence being dominant to non-persistence [1, for review 2].

Studies of lactase activity in samples of adult intestine from populations of unrelated individuals show a clear trimodal distribution [35]. The frequencies of the individuals in these three groups are consistent with the two-allele model, in which the group of individuals with intermediate activity represent the heterozygotes and the other groups are the homozygotes. This suggested that the relevant genetic element(s) may be cis-acting and hence be within or close to the lactase gene (LCT).

Despite sequence analysis of 1 kb of the promotor region and the complete cDNA of the lactase gene in a few individuals of known phenotype, the molecular basis of this polymorphism is not yet known. Single base changes have been seen, but none of these were obviously associated with the lactase persistence/non-persistence polymorphism [6, 7]. The level at which the difference in expression of lactase is regulated has also been controversial [810]. Recent studies suggest that in most cases lactase non-persistent individuals show a lower level of lactase mRNA [4, 11]. These findings do not, however, distinguish between a cis- or a trans-acting mechanism.

In order to determine whether the lactase gene is directly implicated in the lactase persistence status, we have searched for polymorphisms with a view to making the LCT gene more informative for family and association studies, and for identifying individual lactase transcripts. We have focused on regions of the gene in which base changes had already been reported [6, 7]. We made use of polymerase chain reaction (PCR)-based techniques that are sensitive to the detection of a wide variety of small base alterations in DNA, namely single-strand conformational analysis (SSCA) [12], denaturing gradient gel electrophoresis (DGGE) [13] and simple Polyacrylamide gel electrophoresis (PAGE). High resolution was obtained by using very small quantities of DNA and detection by silver staining. In addition we have determined the precise location of a previously described MspI polymorphism [14].

Materials and Methods

Samples

50 large sibship families obtained from the Centre d’Etude du Polymorphisme Humain (CEPH) [15] were investigated in this study. 37 were originally from Dr. Ray White’s laboratory in Utah, 10 were from France and 3 from other sources. Genomic DNA was obtained directly from CEPH or prepared from blood samples or cell lines using an ABI 340A Nucleic Acid Extractor.

Polymerase Chain Reaction

Four segments of the lactase gene were amplified: a portion 5′ to the coding region (5F); a portion spanning the second exon (F2); a region extending from exon 16 to exon 17 (LCT3); and a region of 3′ untranslated sequence spanning the polyadenylation signal (UT). The sequences of the oligonucleotide primers and the sizes of the PCR products are given in table 1. The oligonucleotide primers were synthesised using an ABI 391-PCR-MATE.

Table 1 Sequence and position of the oligonucleotide primers within the lactase gene and the sizes of the PCR products

Fragments were amplified using the reaction mix recommended by the manufacturers of the Taq polymerase (Advanced Biotechnologies or Promega). The conditions of the amplification for the 5F fragment were: after initial denaturation for 5 min at 95 °C, 30 cycles of amplification consisting of denaturation for 20 s at 94 °C, annealing for 20 s at 47 °C and elongation for 40 s at 70 °C. The F2 fragment was denatured as above but the subsequent 30 cycles consisted of denaturation for 20 s at 94 °C, annealing for 20 s at 52 °C and elongation for 20 s at 70 °C. The cycles for the UT fragment were as for 5F except the elongation was only for 20 s. The LCT3 fragment was amplified using 30 cycles consisting of a denaturation stage as above, annealing for 20 s at 53 °C and elongation for 40 s at 70 °C. The F17/LCT fragment was amplified using primers LCT3A and Fl 7S for 30 cycles of 20 s at 94, 50 and 70 °C. 10 µl of 5F PCR products were digested using 0.8 U of the restriction enzyme AvaII (BRL) in a final volume of 40 µl using the conditions recommended by the manufacturers. 10 µl of LCT3 or F17/LCT were digested by MspI (BRL) under similar conditions.

Single-Strand Conformation Analysis

Samples were either mixed 1:1 with loading buffer (consisting of 95% deionised formamide, 20 mM EDTA, 0.05% xylene cyanol and 0.05% bromophenol blue) in the case of digested products, or were mixed 1:1:2 (sample:water:loading buffer) in the case of neat PCR products. They were then heated to 85–95 °C for 5–10 min and snap cooled on ice. The 1-Kb ladder and (px/174 digested with HaeIII (BRL) were used as molecular weight markers and reference points. The gel compositions used were 6% acrylamide (37.5:1, acrylamide:bis, Bio-Rad) in 0.086 M Tris, 1.9 mM EDTA and 0.09 M borate buffer, pH 8.4 (1 × TBE) or in 0.5 × TBE, with or without the addition of 5% glycerol. The gel size was 17 × 13 cm × 0.8 mm, and electrophoresis was performed in a BRL vertical gel tank. Electrophoresis was carried out for various times with voltage limiting at 400 V, either in a cold room, temperature 4–10°C, the gel surface temperature remaining below 30 °C, or at room temperature (approximately 22°C). The conditions defined for the routine analysis of the 5F PCR product were: 1 × TBE, 5% glycerol for 2 h, in a cold room, and for the F2 product were: 0.5 × TBE, 5% glycerol for 1.5 h, in a cold room.

Denaturing Gradient Gel Electrophoresis Analysis

Electrophoresis was carried out on a modified Hoefer SE 600 vertical electrophoresis apparatus [16]. The gel was submerged in 0.04 M Tris-acetate, 1 mM EDTA, pH 7.4 (1 × TAE) at 61°C with circulation of electrolyte between anode and cathode. The gels, 14 × 18 cm × 0.75 mm, consisted of 10% acrylamide (37.5:1 acrylamide:bis, Bio-Rad) in 1 × TAE with a linear 40–50% gradient of chemical denaturant (100% denaturant being 7 M urea, 40% formamide). Digests of PCR products (5 µl) were mixed with an equal volume of 1 × TAE buffer and 2.5 µl of loading buffer (loading buffer composition was 20 % Ficoll, 0.5% bromophenol blue, 10 mM Tris, 1 mM EDTA, pH 7.8). Electrophoresis was performed with voltage limiting at 35 V (approximately 65 mA) for 22 h.

Analysis of the UT Product by Non-Denaturing PAGE

Gels were prepared containing 6% acrylamide (19:1 acrylamide:bis, Bio-Rad) in 1 × TBE. Samples were diluted 1:19 in sterile distilled water and then mixed 1:1 with loading buffer (40% sucrose, 0.1 % bromophenol blue, 0.1% xylene cyanol) such that the amount was equivalent to about 5 ng DNA (0.25 µl PCR product) and loaded onto the gel without denaturation. Electrophoresis was carried out in a cold room (4–10°C) at 40 m A/gel and with limiting voltage of 400 V, for 2.5 h until the xylene cyanol marker had run off the gel.

Restriction Fragment Length Polymorphism (RFLP)

Two different methods were used. For Southern blot analysis of MspI-digested genomic DNA from the CEPH families, the LCT3 PCR product was used as probe. The PCR product was gel purified and labelled using a Multiprime kit (Amersham). The Hybond N+ (Amersham) filters (prepared by EUROGEM) were prehybridised and hybridised as recommended by the manufacturer. Alternatively LCT3 or F17/LCT PCR products from the family members were digested with MspI and the digestion products analysed by non-denaturing PAGE and silver staining.

Silver Staining

The gels were fixed in 10% ethanol and 0.5% acetic acid using two 3-min incubations, then incubated in 0.1% silver nitrate (freshly made) for 10 min. They were then washed in two changes of distilled water and incubated in staining solution (375 mM sodium hydroxide, 2.6 mM sodium borohydride and 0.148% formaldehyde) until the bands were visible (maximum 20 min). Gels were subsequently vacuum dried and stored flat.

Sequencing

The 5F and F2 PCR products were sequenced by the dideoxy chain termination method [17] using the Sequenase kit (USB). Single-stranded template was prepared by biotinylation of one strand and separation on streptavidin-coated magnetic beads. In the case of F2 the PCR products (1 µl) were reamplified using 5 pmol of the same primers, one of which was biotinylated. In the case of the 5F product the initial PCR product was gel purified and then reamplified using sense or antisense primers located in the Alu element (AluS and AluA; table 1) together with biotinylated 5FS or 5FA. Strands were separated on Dynabeads (M-280, Dynal) in 0.1 M NaOH and both the biotinylated strand, which was attached to the beads, and the NaOH eluate were sequenced using 2 pmol (AluS, AluA, F2S, F2A) or 5 pmol (5FS, 5FA) as primer.

Linkage Analysis and Determination of Haplotypes of Alleles in LCT

Lod scores were calculated from the equations described by Maynard-Smith et al. [18], using the computer program HANDLINK [J. Attwood, personal commun.]. Unambiguous haplotypes were determined for 240 chromosomes in the CEPH pedigrees by analysis of the joint segregation of the alleles detected in each fragment. This included information on the other chromosome from each of the grandparents where this was available. Chromosomes for which the information was incomplete were excluded from the analysis. In those cases where the families are known to be related the duplicated chromosomes were counted only once.

Calculation of Linkage Disequilibrium

Linkage disequilibrium (D) for pairs of sites was measured by calculating the deviation of the observed frequency of the haplotype from that expected from multiplication of the individual allele frequencies and expressed as a ratio of Dmax (D/Dmax). Dmax (the maximum possible disequilibrium for a given pair of allele frequencies) was taken as the minimum value of rs or (1 − r) (1 − s) where D < 0 or the minimum value of r(1 − s) or s(1 − r) where D > 0, where r is the frequency of the rarer allele at one site and s is the frequency of the rarer allele at the second site [19, 20]. The significance of the difference of D/Dmax from 0 was calculated as a χ2 with 1 d.f. using the equation D2N/[r(1−r)s(1−s)][19].

Results

Several short regions of the lactase gene, including some in which base changes had previously been reported [6, 7], were amplified using the PCR technique. Three of these regions were found to show polymorphism in a preliminary screen of a test population of 8–10 individuals and were therefore studied further. One region spans an Alu element 5′ to the promotor of the lactase gene and the other two contain exon sequences. A previously reported MspI polymorphism was localised and detected, either by digestion of the appropriate PCR product with MspI, or by Southern blot analysis of Mspl-digested DNA using the PCR product as probe.

Analysis of the 5′-Flanking Region of the Lactase Gene (5FPCR Product)

The 534-bp 5F product was digested with Avail to give two fragments of 310 and 224 bp. These same digests were used for both SSCA and DGGE analysis.

SSCA

Initially both denatured and non-denatured samples were tested in order to distinguish the double- and single-stranded fragments. Under the conditions used the single-stranded fragments migrated more slowly than the double-stranded fragments (fig. 1). We noted that the single- and double-stranded DNA give slightly different colours when silver stained: the double-stranded DNA is browner and the single-stranded more orange. The single-stranded bands corresponding to each of the digestion products were distinguished by separate analysis of each of the products of the Avail digestion (data not shown). The faster migrating components correspond to those produced from the smaller fragment. Initial screening of these fragments was carried out using the eight different gel conditions. Allelic variation was detected in the smaller single-stranded fragment on all four gel compositions when they were run in the cold room (sample 5, fig. 1). Variation was also detected in the larger fragment but only on gels containing glycerol (sample 2, fig. 1). All subsequent gels contained 5% glycerol, 1 × TBE and electrophoresis was conducted in the cold.

Fig. 1
figure 1

Photographs of SSCA of the same series of samples under two different electrophoretic conditions (with glycerol and without glycerol, 1 × TBE in the cold). The 5F fragment was digested with Avail and the positions of the small (S) and large (L) fragments are indicated. The single-stranded components are indicated by SS. The bands were visualised after silver staining.

The variation detected in the smaller fragment was named ‘1/2’ (frequencies 1, 0.93 and 2, 0.07) and that in the larger fragment was named ‘1/3’ (frequencies 1, 0.98 and 3, 0.02). Polymorphism was also evident in the double-stranded DNA of the smaller fragment but this was clearly independent from the 1/2 allelic variation detected in the single-stranded DNA. The two alleles were named F and S (fast and slow) to describe their relative electrophoretic mobility. The individuals with 3 bands (lanes 2 and 5, fig. 1) are putative S/F heterozygotes. The extra bands can be attributed to heteroduplex formation between allelic strands which differ at a site which does not alter the mobility of the double- or single-stranded DNA under the conditions tested.

DGGE Analysis

Use of the computer programmes MELT87 and SQHTX [21] demonstrated two melting domains within each of the AvaII digestion products of the 5F PCR product. Since the GC-rich, higher melting domain would behave as a natural ‘GC clamp’ the same AvaII digests were analysed directly, without using primers containing a synthetic GC clamp. Initially, the digests were analysed using a gradient of 20–60% denaturant and clear evidence of genetic variation was obtained. The conditions were then optimised for the detection of these variations by the use of a narrower gradient and an experimental time course. The S/F polymorphism, observed in the double-stranded DNA on the SSCA gels, was resolved unequivocally by DGGE (lower bands, fig. 2a) as was the 1/3 polymorphism in the larger fragment (not shown). DGGE revealed additional polymorphism in the larger fragment, not seen by SSCA, which we have called 1/4 (upper bands, fig. 2a).

Fig. 2
figure 2

Photographs of the analysis of AvaII digests of the 5F PCR product by DGGE (a) and SSCA (b) of the CEPH family 1447. The relationship between the people is shown in the pedigree above the gels. The tracks below a symbol in the pedigree correspond to DNA from that individual. The phenotypes assigned by DGGE analysis (a) were: father 1, S/F; mother 1/4, S/F; mother’s mother 1, S; the last child 1/4, F (representative individuals). The phenotypes for the variation detected by SSCA (b) were: father 1/2, and mother 1. In individuals heterozygous for the 1/4 or S/F polymorphisms the fainter pair of bands of lower mobility may be explained as heteroduplexes between the two alleles. These were not seen in all cases since the samples were not treated to promote heteroduplex formation. The bands were visualised after silver staining.

The allele frequencies observed in the CEPH population for the 1/4 polymorphism were 1,0.85 and 4,0.15, and for the S/F polymorphism S, 0.76 and F, 0.24. DGGE analysis and SSCA of the same series of samples from a single family are shown in figure 2.

Sequence Analysis of the 5 F PCR Product Sequencing of the 5F PCR products from 7 individuals of different phenotypes identified the base changes responsible for the alterations in mobility of the fragments detected by SSCA and DGGE and confirmed the existence of four different polymorphic sites. The data are summarized in table 2. The site responsible for the 1/3 polymorphism was identified by the analysis of four different heterozygous individuals, because this allele has so far not been found in homozygous form. In all cases the allele was shown to carry a T at nucleotide position −957 (like the 4 allele) in addition to the substitution at position −874.

Table 2 Nucleotide differences responsible for the polymorphisms in the LCT gene

SSCA of Exon 2

The F2 PCR product which spans exon 2 was analysed under a variety of conditions with the aim of revealing the sequence polymorphism described previously at nucleotide 666 [6]. This polymorphism was revealed by use of 0.5 × TBE and glycerol in the gels and electrophoresis in the cold. An example of this polymorphism is shown in figure 3. Each homozygote shows a pattern of three single-stranded bands. The heterozygote phenotype appears to be a straightforward combination of the homozygote patterns. The observation of three bands corresponding to each of the alleles suggests that one of the strands can form two equally stable conformers. Sequence analysis confirmed the nucleotide substitution at position 666 (table 2) and showed that allele A corresponds to the presence of a G and allele B an A (frequencies A, 0.83 and B, 0.17).

Fig. 3
figure 3

Photograph of a gel showing SSCA of the F2 PCR product in samples from CEPH family 17. Bands corresponding to the double- and the single-stranded DNA are indicated. The genotypes shown are deduced from the family structure. The track labelled 1 kb contains the kilobase ladder molecular weight markers (BRL). The bands were visualised after silver staining.

Analysis of an MspIRFLP in Exon 17

Examination of the distribution of MspI sites and the base changes observed in the published sequences [6,22] suggested that the previously reported MspI RFLP [14] might be due to variation at an MspI site in exon 17. This hypothesis was tested by MspI digestion of PCR products (LCT3 and F17/LCT) spanning the relevant region. In each case, two alleles were observed, one where the PCR product was not digested and the other in which it was digested. The PCR product LCT3 generates digestion fragments of 184 and approximately 1,250 bp, whereas the F17/LCT product gives fragments of 184 and 60 bp (fig. 4). The LCT3 PCR product was also used as a probe on Southern blots of Mspl-digested genomic DNA. Two bands of approximately 5–6 kb were distinguished which differed in size by approximately 200–300 bp, consistent with the previously reported polymorphism. The CEPH samples were, in most cases, tested by Southern blot analysis of MspI-digested genomic DNA and probing with the undigested PCR product. Some samples were analysed by digestion of the LCT3 PCR products. The results obtained were in complete agreement with the MspI polymorphism data already on the CEPH data base [Kruse et al, unpublished]. In the few cases where samples were temporarily unavailable to us, the results already on the data base were used to complete the haplotypes. The allele frequencies observed were 0.78 for MspI + and 0.22 for MspI−.

Fig. 4
figure 4

Photograph of a gel showing the MspI polymorphism in the F17/LCT fragment. The bands were visualised after silver staining.

Analysis of the UT Product (Exon 17)

The previously reported deletion (Δ)/insertion (I) of two base pairs at nt 6,236/7 [6] was found to be distinguishable by simple non-denaturing Polyacrylamide electrophoresis, provided that the amount of DNA loaded was reduced to approximately 5 ng. It is noteworthy that heteroduplexes can be detected in the heterozygotes. An example of this analysis is shown in figure 5. The allele frequencies observed were 0.83 for I and 0.17 for Δ.

Fig. 5
figure 5

Photograph of a gel showing the three phenotypes detectable for the ΔI polymorphism in the UT PCR product from a selection of unrelated individuals. I indicates the presence of the insertion and Δ the deletion. Approximately 5 ng of PCR product was loaded in each track. The bands were visualised after silver staining.

Haplotype Determination

Analysis of each of these polymorphisms in the CEPH families confirmed that they are inherited in a Mendelian fashion. All seven polymorphisms were linked, showing a high lod score and no recombination. In the case of the 5 F polymorphisms and the MspI polymorphism, which are located at opposite ends of the gene, all individuals in the informative families were tested and the lod score was 32 at 0 = 0. Subsequent analysis of the haplotypes was conducted assuming no recombination. Analysis of the variation in the 5 F fragment revealed that the 4 allele and the 2 allele always occurred on different chromosomes (fig. 2) but that these each showed complete association with the F allele at the fourth site in this fragment (S/F), thus generating three haplotypes (table 3a). The rarer 3 allele was shown to be related to the 4 allele by carrying an additional nucleotide substitution in the same DNA fragment and generated a fourth haplotype. Clear patterns of associations were also found with and between each of the other individual sites (table 3a, b). In particular it is noteworthy that the S/F polymorphism (in the 5′-flanking region) shows the highest level of association with the MspI polymorphism (exon 17), while the F2 polymorphism (exon 2) shows the highest level of association with the UT polymorphism (exon 17). Although this analysis revealed a number of rare haplotypes, there were only three common haplotypes in the CEPH population, with the haplotype that carries the 3 allele being the fourth most frequent. All the haplotypes observed are shown in table 4 together with the ten haplotypes that might have been expected but were not observed. It can be seen that the frequencies of these haplotypes deviate significantly from those expected by random association of the alleles. Calculation of the link-age disequilibrium parameter (D/Dmax) for each pair of sites is depicted in figure 6 and reveals that there is a high level of disequilibrium across the region with no hint of any correlation with distance.

Fig. 6
figure 6

Diagrammatic representation of the linkage disequilibrium (expressed as D/Dmax) across the lactase gene. At the top is a schematic representation of the lactase gene showing the gene structure and the positions of the polymorphic sites. Shown below are the values obtained for D/Dmax for the pairs of sites. The 3 allele is excluded from this analysis because the sample size is too small. * Not significantly different from 0 (p > 0.01).

Table 3a Associations between the alleles located at different sites in the LCT gene a Association of the 2, 3 and 4 alleles at three of the polymorphic sites in the 5F fragment with the F and S alleles at the fourth site in this fragment
Table 3b Pairwise comparisons of the alleles at the 6 most polymorphic sites
Table 4 Frequencies of the haplotypes observed in the CEPH population in comparison with those expected from random assortment of the alleles

The CEPH families which share the common characteristic of large sibships come from various sources. They comprise a group from France, a large group collected by Dr. Ray White’s laboratory in Utah and a few assorted others. It was very noticeable that the frequencies of the two commonest haplotypes A and B differ significantly (p < 0.001, by Fisher’s exact test) in the two major groups (table 5).

Table 5 Comparison of the frequencies of the haplotypes observed in the two major sub-populations of the CEPH series

Discussion

Seven different polymorphisms in the lactase gene were analysed in this study. Their relative positions in the gene can be seen in figure 6. Four sites were in the 534-base pair region (5F) located between −997 and −464, upstream from the start of transcription. Previous sequence analysis had revealed two sites within this region that showed sequence differences in 1 of 10 chromosomes examined [7]. This study shows that these substitutions (−957C/T and −552/9AA/A) are indeed responsible for the variation we detected and correspond to the 1/4 and S/F polymorphisms, respectively. The 1/3 and 1/2 polymorphisms, on the other hand, represent new sites (−875A/G and −678A/G, respectively). Sequence analysis of exon 2 by Boll et al. [6] previously demonstrated a polymorphic site at nucleotide position 666 in the cDNA, 7 chromosomes possessing G and 4 having A at this site. This polymorphism results in a valine to isoleucine change in the pre-pro-protein (V/I219). The SSCA conditions described here allow the simple detection of this polymorphism, allele A corresponding to the presence of a G at this site and allele B corresponding to an A. Boll et al. [6] also observed a GT duplication 7 nucleotides upstream of the putative polyadenylation signal in 7 chromosomes of the 11 sequenced [6]. The simple PAGE conditions described here allow the detection of this variation as a mobility difference. We also located the previously reported MspI polymorphism to exon 17, which provides another positioned marker in the lactase gene.

Analysis of these 7 polymorphisms has allowed the frequencies of these alleles and the haplotypes to be determined using unrelated individuals of the CEPH panel. In the 240 chromosomes analysed only 9 of the possible 128 (27) haplotypes were observed and only 3 of these were common. 10 other haplotypes predicted to occur at high frequency were not seen. The region of linkage disequilibrium extends across the whole gene (60–70 kb).

The particularly high level of association between alleles in the F2 (666A/G) and UT (6,236/7GT/ΔΔ) fragments and between S/F in the 5F fragment (−552/9A/AA) and the MspI site (5,579C/T) is interesting, since these two regions overlap and each span 50–60 kb. These overlapping associations mean that it is not easy to hypothesise the possible evolutionary phylogeny of the three common haplotypes. There is no suggestion that reciprocal recombination was involved, and it is tempting to implicate gene conversion. However, the D haplotype presumably arose from the B haplotype due to an additional, more recent, point mutation at nt −875.

It was noteworthy that the B haplotype differs markedly in frequency between the French and Utah families. It was thus also of some interest that the distribution of the B haplotype among the Utah families was uneven, 5 of the 17 chromosomes being found in one family. Most of the Utah families come from the local community. This relatively recent population which originated largely from deliberate colonisation by the Church of the Latter Day Saints (Mormons) during the second half of the last century came mainly from eastern USA, UK and Scandinavia. The pronatalist policy of the Mormon Church encourages large sibships and has made available many suitable families for genetic study. The studies of McLellan et al. [23], based on the analysis of multiple polymorphic enzyme markers and blood groups, have indicated that the Utah population is representative of northern Europe. It was thus of some considerable interest to discover from Dr. R. White and Dr. M. Leppert that the family which carries 5 B haplotype chromosomes is one of a few families that was not collected in Utah and is probably of central Europe origin.

The existence of a large region of linkage disequilibrium means that if the sequence which determines the lactase persistence polymorphism is indeed cis-acting it is possible that lactase persistence will show some association with the DNA polymorphisms even if the relevant sequence is located at some distance from the gene itself. Association studies are therefore underway, as well as other genetic studies, to locate the polymorphism which determines lactase persistence status. It will be of interest to determine the haplotypes in other populations and also in higher primates, since this may help towards an understanding of the evolutionary history of the lactase gene and the lactase persistence polymorphism.