Introduction

The LDL receptor is a cell surface glycoprotein, which has a focal role in regulating low-density lipoprotein cholesterol (LDL-C) levels in humans, as it is the major protein involved in cholesterol transport. The receptor mediates the specific binding, uptake and degradation of plasma lipoproteins containing either apolipoprotein B or apoE. Remnants of triglyceride-rich lipoprotein metabolism (also known as intermediate-density lipoprotein), which are the precursors of plasma LDL and LDL itself, are recognized by the LDL receptor. It regulates plasma cholesterol by mediating the endocytosis of LDL-C by means of clathrin-coated pits.1 An elevated LDL-C level is a major risk factor for the development of coronary artery disease through the role it has in the etiology of atherosclerosis.2 The LDL-C levels of the black South African population are thought to be historically low3 and the prevalence of coronary artery disease is minimal.4 Genetic variation influences LDL-C levels in a population and 50% of the interindividual variability is due to this genetic component.5 A large number of genetic variants have been identified that cause a range of LDL-C levels in various populations.6 Some variants, such as the single-nucleotide polymorphism (SNP) found in the promoter region of the LDLR gene, are able to increase the promoter activity by 50% and cause lower LDL-C levels.7 The question arises whether there are genetic variants present in the South African Prospective Urban and Rural Epidemiological (PURE) study population that contribute to the different levels of LDL-C, and to what degree these variants associate with LDL-C levels. The aim of this study was thus to determine whether single-nucleotide polymorphisms (SNPs) in the LDLR gene are present in the study population and to what extent these variants associate with LDL-C.

Materials and Methods

Study population

The study population is part of the South African PURE study and consists mainly of self-identified Tswana-speaking individuals. The PURE study is a prospective cohort study that follows changes in lifestyle, risk factors and chronic disease by means of periodic standardized data collection in two urban and two rural areas in the NorthWest Province of South Africa over a period of 12 years. The central hypothesis of the study states that lifestyle (physical activity and nutrition) and psychosocial transitions secondary to urbanization and industrialization lead to behaviors that cause the development of chronic noncommunicable disease, such as CVD, diabetes, high blood pressure and obesity. Two thousand eligible volunteers between the ages of 35 and 60 years were randomly selected. To avoid bias from population stratification, the urban and rural cohorts were matched for ethnicity, gender and age. The design of the PURE study has been described previously in greater detail.8 DNA samples for 1860 of the volunteers were available for analysis.

Biochemical analysis

Blood was sampled after 8 h of fasting. Serum lipid (high-density lipoprotein and triglyceride) measurements were carried out by a Sequential Multiple Analyzer Computer using a KonelabTM auto analyzer (Thermo Fisher Scientific, Vantaa, Finland). Plasma LDL-C levels were determined by using the Friedewald equation (LDL-C=TC−HDL-C−(TG)/2.17)9 and missing values were assigned to individuals with a triglyceride measurement of >4.5 mmol l−1.

DNA analysis

Informed consent was obtained from each participant in the PURE study for the collection of blood samples and ethical approval for the study was given by the Ethics Committee of the North-West University (Ethics number: NWU-00016-10-A1). DNA isolation was performed by using Qiagen Flexigene DNA extraction kits (QIAGEN, Valencia, CA, USA; catalog number 51206).

A sub-sample of 30 randomly chosen individuals was used to identify known and novel mutations in the exons and flanking regions of the LDLR by means of Sanger sequencing. Thirty individuals would give a total sample size of 60 chromosomes and, as communicated by Collins and Schwartz10, using 40–65 chromosomes will detect polymorphisms at a frequency of 0.05 with a power of 0.80–0.95. Polymerase chain reaction amplification of the proximal promoter region, the exons and the exon–intron boundaries of the LDLR gene was performed on an iCycler (Bio-Rad, Hercules, CA, USA). Primers were designed to amplify and sequence the various regions of the LDLR, and these are described in supplementary documentation (Supplementary Table 2). NucleoFast 96 PCR Clean-Up Kits from Machery-Nagel (Düren, Germany) were used for the clean-up of the polymerase chain reaction product prior to performing the sequencing process. Automated cycle sequencing was performed according to the basic principles of Sanger sequencing.11 The standard ABI Prism, BIGDye Terminator version 3 Ready Reaction Cycle Sequencing Kit supplied by Applied Biosystems (Carlsbad, CA, USA) was used as a platform for the sequencing reactions. The purified product underwent electrophoresis on a 50 cm capillary either on an ABI3730xl DNA analyzer or an ABI3120xl genetic analyzer according to standard protocols.

Genotyping

The sequences that were generated were examined for the presence of variants using the CLC DNA Workbench version 6.0.2. (Aarhus, Denmark). Variants were confirmed by repeat sequencing in the opposite direction. Seventy variants were identified within both the coding and noncoding areas of the LDLR gene, including one insertion (a 1 bp insertion, c.*1218_1219Gins, in the 3′ untranslated region (3′UTR) not previously reported), one deletion (rs17249078) and 68 SNPs (these are available in Supplementary Table 1). Five of the SNPs were novel and not previously reported. Genotyping of the population was done by means of the Illumina VeraCode GoldenGate Genotyping Assay on a BeadXpress platform using the VeraCode technology. All 68 SNPs (rs numbers for known SNPs and sequences for novel SNPs) were sent through to Illumina Technical Support for evaluation using the Assay Design Tool. Each SNP was scored (varying from 0–1) by the Assay Design Tool based on the compatibility with successful GoldenGate genotyping. SNPs with a score above 0.4 were chosen for genotyping. Fourteen SNPs fell out for selection because they were located less than 60 bp from another SNP (this is a prerequisite for the assay design) and 10 SNPs had a design score of less than 0.4. Twenty-nine SNPs were then selected on the basis of their frequency in the sub-sample, novelty and whether they were previously reported to associate with LDL-C. A further four SNPs (rs3745677, novel 5, rs2304182, rs5742911) failed during the execution of the assay on the BeadXpress platform. The study population was finally genotyped for 25 SNPs (of which two were novel and not previously reported). SNPs with a gene call score of >0.5 and samples with a call rate of 0.9 were included in the analysis. The A of the initiating ATG codon is designated as nucleotide c.1 and the reference sequence LDLR: NM_001195798.1 was used.

Haplotype frequencies

Haplotype frequencies were determined by means of the software package Haploview version 4.2.25 (Broad Institute, Cambridge, MA, USA) to determine the haploblocks. Markers with a Hardy–Weinberg P-value of 0.05 and a minor allele frequency of 1% were included in the analysis. Association analyses were conducted using LDL-C of 3 mmol l−1 for controls and >3 mmol l−1 for cases.

Measurements

The anthropometric measurements such as weight (kilograms), height (meters) and body mass index (body mass index=weight in kilograms per height2 in meters) were supervised by a level 3 anthropometrist according to guidelines adopted at the Airlie Conference sponsored by the National Institutes of Health. Standardized and calibrated equipment was used for measurements on volunteers wearing minimal clothing.12

Diet

Dietary intakes of the volunteers were determined by means of previously developed13 and validated14 culturally sensitive quantified food frequency questionnaires. The nutrient intakes were analyzed by using the Foodfinder3 program (Medical Research Council, Tygerberg, South Africa), which is based on the South African food composition tables.15

Statistical analysis

Descriptive statistics on the study population were computed on the statistical program SPSS version 20 and are expressed in means and s.d. The observed genotype frequencies were tested for deviation from Hardy–Weinberg equilibrium by using the χ2 test of independence. Analyses of variance were used to test association between different genotypes and LDL-C levels. Adjustments for the confounders, age, gender and body mass index were made in all analyses. The Bonferroni correction was used to deal with the issue of multiple testing. P-values of 0.05 or lower were considered statistically significant.

Results

The characteristics of the study population are shown in Table 1. The mean LDL-C levels for the men were 2.70 mmol l−1 (s.d. 1.12) and for the women, 3.06 mmol l−1 (s.d. 1.17). The mean age for both men and women was similar at 49.9 years (s.d. 10.4) and 49.1 years (s.d. 10.4), respectively. Twenty-five of the 70 variants identified by bidirectional sequencing of polymerase chain reaction products were finally genotyped for this study population. Two of these variants were novel and have not been previously reported.

Table1 Population characteristics

The European and African populations from the 1000 Genomes Project were used to compare the minor allele frequency of the study population (Table 2). The following rare variants were absent in the European population: rs17249141, rs148698650, rs116405216 and rs72658862. All four previously mentioned variants were present in the study population but, according to the 1000 Genomes database, rs148698650 is not present in the African populations. Rare and common variants such as rs17242759, rs10423288, rs11669576, rs5929, rs4508523, rs6413503, rs3826810 and rs2738464 have a minor allele frequency that is greater in the study population and the African population than in the European population. Variants with a minor allele frequency greater than that of the study population and the other African populations that are present in the European population are as follows: rs2228671, rs5930, rs2738447, rs688, rs5925, rs14158, rs1433099 and rs3180023. Three common variants, rs1569372, rs5927 and rs2738465, have similar minor allele frequencies across the three different populations.

Table 2 Minor allele frequency of the various SNPs in LDLR in different populations

Variant c.-314 C→T (rs369850745) was found in the promoter region of the gene but the association with LDL-C levels was not statistically significant (P=0.85) (Table 3). The other novel SNP found in intron 15, c.2311+9 T→G (rs369402076), was also not associated with LDL-C levels. For both variants, only heterozygote carriers were detected and both variants were in HWE. Both these novel SNPs had a very low minor allele frequency of 0.01 and 0.02 for novel rs369850745 and rs369402076, respectively. A very rare variant in the promoter region of the LDLR gene, rs17249141 (c.-217 C→T), had a significant association with decreased LDL-C levels (P=0.05). For variant rs5930, detected in exon 10, the mean LDL-C level (3.29 mmol l−1) for the AA genotype was higher than for the other two genotypes, although this was not statistically significant (P=0.09); the mean increase in LDL-C level for the AA genotype is 13.4%. In exon 11, variant rs5929 had a 3.5–5% reduction in LDL-C levels for each mutant allele even though there was no significant association with LDL-C levels. Intron 11 harbored the variant rs2738447, where the AA genotype was significantly associated with elevated LDL-C levels with a 23.7% increase in mean LDL-C levels when the individual had both the mutant alleles (P=0.004). SNP rs5925 was detected in exon 13, where the CC genotype had a mean LDL-C level greater than the TT and the TC genotypes, although the association did not reach statistical significance (P=0.06). In the 3′UTR, rs14158 was significantly associated (P=0.05) with mean LDL-C levels with an increase of 27.9% when both of the mutant alleles (AA) were present. Variants rs2738465 and rs3180023, also in the 3′UTR, were both significantly associated with elevated LDL-C levels (P=0.05 and P=0.006, respectively). All the SNPs were in Hardy–Weinberg equilibrium except for rs6413503.

Table 3 LDLR genotype distribution and association with LDL-C

Haplotype analysis

Four blocks were identified from the haplotype analysis performed with the software package Haploview version 4.2.25 (results in Table 4). From the association analysis, all four blocks contained haplotypes that were associated with either elevated or lower LDL-C levels. Block 1 revealed two haplotypes, namely GCGGC and GCAAC, where the first haplotype was associated with lower LDL-C (P=0.03) and the second haplotype was associated with increased LDL-C (P<0.01). Block 2 also contained two haplotypes where the CT haplotype was associated with lower LDL-C as more controls presented with the haplotype. The other haplotype in Block 2 was the AT haplotype, which was associated with elevated LDL-C. Three haplotypes were calculated in Block 3 and two of these were associated with elevated LDL-C levels, namely AGG (P=0.03) and GAG (P=0.01). The third haplotype (GGG) in Block 3 was more prevalent among the controls and is therefore associated with lower levels of LDL-C. Only one of the haplotypes (GC) identified in Block 4 was associated with lower levels of LDL-C.

Table 4 Haplotypes associated with LDL-C within the PURE population

Discussion

For this study, we sequenced the exons and flanking intron regions of the LDLR gene to determine the genotypic variation of the population within the gene. Of the 70 SNPs identified by the bidirectional sequencing, 25 were genotyped within the entire study population, only two of which were novel (rs369850745 and rs369402076). We also performed haplotype analysis to determine whether there were any associations with mean LDL-C levels.

There were no variants found in exons three, four and five in the sub-sample of 30 individuals. Together with exons two and six, the above-mentioned exons code for the ligand-binding domain which is of great importance in binding apoE-containing lipoproteins such as LDL-C. Mutations within repeats four and five of the ligand-binding domain (present in exon 4) are common in FH patients.16

We found that four rare variants (rs17249141, rs148698650, rs116405216 and rs72658862) were present within the study population as well as in the African populations from the 1000 Genomes database but they were absent from the European population. SNP rs148698650 was not detected in the 1000 Genomes African samples but it was identified in a woman (heterozygote) in the 1000 Genomes population of Mexican ancestry from Los Angeles. The SNP was originally detected in individuals of Spanish, Cuban and Swedish descent.17, 18, 19 This could possibly be the first report of rs148698650 in an African population. The minor allele frequencies of the SNPs in the study population are more comparable to the African populations than the European populations. Variants such as rs688 and rs5925, which were used in a genetic risk score to define the predictive value of genetics in determining cholesterol levels,20 would not be an obvious choice in the study population as their frequencies are much lower than in the European population. The frequencies and associations of variants in different populations should be investigated before they are used in further analyses to ensure their relevance.

The two novel SNPs (rs369850745 and rs369402076) identified had no meaningful effect on the function of the LDL receptor and there was no association with LDL-C levels. Variant rs369850745 is located more than 100 bp from the SREBP1 binding site, which could indicate that the SNP is too far away to have any effect on the transcription of the protein. In intron 15, the variant rs369402076 is most likely located just after the region that would influence the splicing of the messenger RNA (mRNA) or the base change does not have any effect on the splice function. Splice donor mutations 2311+2T>G,21 2311+1G>C22 and 2311+1G>A23 are located close to rs369402076 (2311+9T>G) and these variants are associated with hypercholesterolemia.

Sixteen heterozygote carriers of variant rs17249141 (c.-217C>T) were identified in the study population. This variant was first reported by Scholtz et al.7 in 1999 in a subject of mixed ancestry. The promoter activity of this variant increased to 160% of normal in cells depleted of sterol and 50% in cells supplemented with a sterol-containing medium in the transfection study. In this study, the effect of one allele was an 18.6% reduction in LDL-C levels, which is in accordance with an increased transcriptional rate of the gene.

In exon 11, the silent variant rs5929 (c.1617C>T; P539P) was detected and the base change from a cytosine to a thymine causes a synonymous mutation. As previously reported, the SNP is not associated with LDL-C levels24 and the small lowering effect of the mutant allele in the study population might be caused by a variant with which the SNP is in linkage or it might be a specific phenomenon related to the study population.

The intronic variant rs2738447 (c.1706-55A>C) had a statistically significant association with elevated LDL-C where the homozygote mutants (genotype AA) had a mean LDL-C level of 3.59 mmol l−1. No genotype–phenotype information could be found for this variant. Analyses of this variant in the Human Splicing Finder indicate that the SNP could form part of various enhancer motifs and therefore influence the splicing efficiency of the pre-mRNA, which leads to alternative splicing.25 We can hypothesize that the function of the different proteins (receptors) formed by the alternative splicing could be less effective in internalizing LDL-C than receptors synthesized by normal splicing, which would then lead to increased plasma levels of LDL-C.25

Variants rs688 (c.1773 C>T) and rs5925 (c.1959 T>C) are both synonymous mutations and have been identified as exon-splicing enhancers associated with increased LDL-C levels by Rafiq et al.26 These variants alter the efficiency of LDLR splicing and therefore have an elevated effect on LDL-C.26 As previously mentioned, these two variants have been used in a genetic risk score to determine the predictive value of genetics on plasma total cholesterol levels where the results demonstrated that it could be beneficial in identifying individuals with elevated plasma cholesterol levels.20 In the study done by Lu et al.,20 the minor allele frequencies of both these SNPs were 0.43 in their European population. The minor allele frequencies of the SNPs in the PURE population were much lower (rs688=0.04; rs5925=0.19). Rs5929 and rs688 was in strong linkage disequilibrium (LD) with r2=0.99;20 however, the LD of these two SNPs in the PURE population were much weaker, with r2=0.20. The lower r2 in the study population is also an indication that the two SNPs have been separated through recombination.27 LD between most of the analyzed SNPs (data not shown) is weak when r2 is used as a measure of LD, which is a characteristic of the African genome where more recombination events have taken place.28 A multiallelic D′ of 0.35 computed between haplotype Block 2 and Block 3 also indicates that historical recombination has taken place between the different blocks.27 Rs688 and rs5925 are situated within haploblock 2 (Table 4) as identified by Haploview analyses using Gabriel et al.28 confidence intervals. Differences in LD among populations are possibly attributable to differences in demographic history,29 as the biological determinants of LD (rates of recombination, mutation, gene conversion) are estimated to be constant across groups.28

The 27.9% increase in LDL-C levels in individuals homozygous for rs14158 is indicative of the role the SNP has in the expression of the gene, as variants found in the 3′UTR of a gene have been proven to affect the regulation of mRNA.30 LDLR gene expression is controlled mainly by regulatory sequences in the 3′UTR as it regulates changes in mRNA stability at the transcriptional and the post-transcriptional level.31, 32 Three mRNA destabilizing elements, known as AU-rich elements, responsible for the rapid turnover of LDLR mRNA, have been identified.32, 33 Variant, rs14158 (c.*52G>A) is present in the 2.5-kb-long 3′UTR of the LDLR gene and, together with rs3826810 (c.*141G>A), rs2738464 (c.*315G>C), rs2738465 (c.*504G>A), rs1433099 (c.*666T>C) and rs2738466 (c.*773A>G), it forms a common haplotype reported by Maullem et al.34 Rs2738466 was not genotyped in the PURE study population (SNP is too close to rs1433099 to be analyzed on the BeadXpress platform) and therefore no comparison could be made between the haplotypes formed within the current study population and those used by Muallem et al.34 However, analysis of the five SNPs that were genotyped revealed seven haplotypes, of which only two had significant associations with LDL-C (Table 4). Haplotype GGCGC was associated with lower LDL-C levels with a mean level of 2.69 mmol l−1 and haplotype AGCAC was associated with elevated LDL-C levels with a mean value of 3.71 mmol l−1 (P<0.01).

The AA genotype (homozygote mutant) of variant rs2738465 was associated with increased levels of LDL-C (P=0.05). The SNP is also found in the 3′UTR that forms part of the above-mentioned haplotype and therefore has a role in the regulation of mRNA. When the mutant allele is included in a haplotype (AGCAC), it is associated with increased LDL-C levels. Two heterozygotes were detected for the very rare SNP, rs3180023, and, despite the small number, the association with LDL-C did reach statistical significance (P=0.006). The variant is also more prevalent in the European population than in the study population (MAF=0.001 vs MAF=0.08). Another rare variant, rs2228671, found in exon 2, is a synonymous mutation where the base change of a cytosine to a thymine does not cause the cysteine at position 27 to change to a different amino acid. Even though the association of the SNP in this population is not statistically significant (P=0.47), it has been reported on in various other populations such as a German population where there was a significant association with lower LDL-C and the T allele of the variant. LDL-C was 0.19 mmol l−1 lower for every T allele carried by the German population.35 In the PURE population, such an association was not present but for the TT genotype, LDL-C was 22% lower than for the CC genotype. Rs11669576 (A391T original amino acid nr A370T) was also detected in this study population but with no significant association with LDL-C as determined in various other studies.36, 37, 38

The haplotype analysis according to the confidence intervals by Gabriel et al.28 revealed four haploblocks. Interestingly, the haplotypes from each block with the highest frequency (GCGGC=0.36; CT=0.69; GGG=0.50; GC=0.22) were associated with lower levels of LDL-C except for Block 4 (Table 4). These haplotypes include the alleles associated with lower LDL-C, as can be seen from the first haplotype in Block 1, where the G alleles from rs5930 and rs1569372 have lower mean LDL-C levels. Of the haplotypes in Block 4, the GC haplotype is the only haplotype for which the association with lower LDL-C reached statistical significance, where the G allele in rs2738465 has a lower mean LDL-C level. The T and C alleles in rs1433099 have similar mean LDL-C levels.

The only variant that was not in HWE was rs6413503, where there was an excess of homozygote mutants as a result of the genotyping assay and a large number of heterozygotes were not included in the analysis. This is also the only variant that is not in HWE and therefore we do not suspect that it is due to genetic drift, non-random mating, selection or population structure.

There are few studies on the association of rare and common SNPs with LDL-C in an African population and these results give us a greater insight into how genetics have a role in LDL-C at population level in this particular ethnic group. This was, however, a cross-sectional study and functional information for the novel and lesser-known SNPs is not available. Further research into the specific function of each of the SNPs would be of great value in determining to what extent they regulate LDL-C levels. All the SNPs identified by the automated sequencing could not be genotyped within the entire study population because of restriction within the genotyping assay and it would be of great value to genotype these SNPs as well in the study population as 22 of these variants were detected with in the 3′UTR. The predictive value of the significantly associated SNPs can be determined at a later stage, however, as the study is prospective with a follow-up of 12 years, it would also be of great value to determine the prevalence and frequency of SNPs in the intron regions of the LDLR that were not analyzed as current research indicates the importance of variants found within these regions.

In summary, we have shown that there are SNPs present in the study population that associate with LDL-C levels. For many of these SNPs the prevalence was low and only the individuals harboring these SNPs had significantly higher or lower LDL-C levels. The study population appears to be healthy with normal mean LDL-C levels and the effect of some variants does not raise LDL-C to pathological levels. Unfortunately, the clinical significance of the results is absent because of the observational nature of the study. The SNPs that significantly associated with LDL-C levels were situated mostly within regulatory regions such as the promoter area and the 3′UTR, which affects either transcriptional rate or gene expression (regulatory SNPs). Several of the SNPs, which were situated in the coding areas (exons) of the gene, were also prevalent in a few individuals and either had an association with elevated LDL-C or had no effect on the levels. In this population it appears, therefore, that the regulatory SNPs make a greater contribution toward LDL-C levels than do SNPs in the coding regions. Interestingly, there were no variants present in exons 3, 4, 5, 9 and 17 of the gene. The absence of variants in these exons could be a reason for the low mean LDL-C as well as the lower prevalence of FH in this population. Exons 4 and 9 have been identified as hot spots for mutations causing FH whereas exons 13 and 15 contain a smaller number of mutations and are classified as FH cold spots.39 In agreement with previous evidence,39 we found only one SNP in each of exon 13 and 15.

Haplotypes were also identified with significant associations mostly with lower LDL-C, and the low mean LDL-C levels of the study population could possibly be attributed to the presence of these haplotypes in a large percentage of the population (Table 4). This need to be further investigated, however, to determine whether other confounding factors exist that could result in the low mean LDL-C levels of the study population. Minimal linkage disequilibrium between the various SNPs exists, which is an indication of historical recombination, a characteristic of the African genome.

We also showed that the minor allele frequencies of the identified SNPs in the black South African study population are different from those of the European population, with only a handful of SNPs sharing frequencies across the populations; conversely, the frequencies are very similar to those of other African populations. This indicates that genetic information should be population-specific when used in downstream analysis.

Further research on the other 45 variants identified in the sub-sample of 30 individuals is warranted in order to give us even greater insight into the role that the LDL receptor has in determining LDL-C levels in this study population.