Introduction

Plasma lipoprotein–lipid levels modulate the risk for atherosclerosis and cardiovascular disease (CVD).1 Plasma lipid–lipoprotein levels are under genetic control,2 and genome-wide association studies (GWAS) have revealed several loci associated with plasma lipids.3, 4, 5, 6, 7 However, most of the identified loci have small effect sizes and explain only ~30% of the genetic variance of lipid phenotypes.6 It has been suggested that rare or low-frequency variants with moderate/strong effects that are not captured by GWAS could explain a part of the ‘missing heritability’.8, 9 An effective way to identify these rare/low-frequency variants is to resequence the candidate genes in subjects with extreme phenotypes.10 This strategy has already been successfully employed for various candidate genes involved in lipid metabolism, where multiple rare variants were found to contribute to inter-individual variation in plasma lipid levels.11, 12, 13, 14, 15, 16

Lipoprotein lipase (encoded by LPL) is an enzyme that hydrolyzes triglyceride (TG) rich particles into free fatty acids and glycerol. High LPL activity is associated with lower TG and higher high-density lipoprotein cholesterol (HDL-C) levels.17 Several common LPL variants were reported to modulate lipid levels and CVD risk.18 Resequencing of LPL in patients with hypertriglyceridemia identified mutations associated with extremely high TG levels.12, 13, 14, 19 However, the impact of rare LPL variants on plasma lipids in general population remains largely unknown. In this study, we sequenced LPL in 95 African Blacks (ABs) with extreme HDL-C levels and genotyped-selected variants in 788 subjects to test their association with lipid levels.

Materials and methods

Subjects

The study sample was comprised of 788 ABs from Benin City, Nigeria (Table 1), recruited as part of a study on civil servants to investigate coronary heart disease (CHD)-related lifestyle factors in this generally lean population. Detailed information on the original study including the sample features and methods used for plasma lipoprotein–lipid measurements can be found elsewhere.20, 21, 22, 23, 24 Ninety-five subjects selected from the upper (n=48) and lower (n=47) 10th percentiles of HDL-C distribution in the study sample were used for LPL sequencing (Table 2).

Table 1 Biometric and quantitative data of the study sample
Table 2 Demographics and characteristics of the sequencing sample (95 African Blacks)

DNA sequencing and genotyping

The entire LPL gene (27 993 bp) plus 1196 bp in 5′ flanking and 1 kb in 3′ flanking region were sequenced in both directions (see Supplementary Data for details). After quality filtering, the sequence chromatograms were individually inspected and variants were manually reviewed by at least two researchers. In addition to selecting all relevant common and rare variants for follow-up genotyping (see section Follow-up genotyping of selected LPL variants in the entire sample), any singleton novel variant surrounded by moderate-quality sequences (with some level of background noise) was treated as 'suspicious' and also included in follow-up genotyping for confirmation. Selected variants were genotyped in total sample using TaqMan (Applied Biosystems, Waltham, MA, USA) or iPLEX Gold (Sequenom, San Diego, CA, USA) methods (see Supplementary Data for details), except for HindIII (rs320:T>G) polymorphism genotyped by restriction fragment length polymorphism analysis in an earlier study.16

Statistical analyses

Haploview25 was used to test the concordance with Hardy–Weinberg equilibrium (HWE) and to determine the allele frequencies, linkage disequilibrium (LD) patterns and pairwise correlations (r2). An additive linear regression model was used to test the effects of genotypes on the means of plasma lipid–lipoprotein traits. The nominal P-value of <0.05 was considered as suggestive evidence of association. Benjamini–Hochberg procedure was applied to control false discovery rate (FDR) in single-site analyses for each trait and FDR<0.20 was considered as statistically significant.26 A post hoc power analysis27 was performed to evaluate the power of detecting significant single-site associations for the observed proportions of the variance of lipid traits explained by tested SNPs. For haplotype analysis, the generalized linear model28 was applied using the Haplo.Stats R package (Rochester, MN, USA). Cumulative effects of uncommon/rare variants were analyzed using the SKAT-O method29 and three MAF thresholds (≤1%, ≤2% and <5%). Additional details on statistical analyses can be found in Supplementary Data.

Results

DNA resequencing results

A total of 308 variants were identified, of which 130 were common (MAF≥0.05), 118 were uncommon (0.01≤MAF<0.05) and 60 were rare (MAF<0.01; Figure 1a); 19 were indels and 2 triallelic SNPs (Supplementary Table 1); 246 were identified in introns, 30 in 3′-UTR, 14 in flanking regions, 12 in coding regions and 6 in 5′-UTR (Figures 1b and c). All identified coding variants were known SNPs; seven nonsynonymous and five synonymous. We successfully identified all but three common variants (rs1470187:G>T, rs59184895:T>C and rs328:C>G) reported in African-descent populations (dbSNP build 138); these variants were located at the beginning or end of resequencing amplicons where the sequence read quality is usually low in Sanger sequencing, which probably hampered their identification in our sequencing sample, however, they were successfully genotyped in our entire sample. Of 308 variants, 64 (2 common and 62 uncommon/rare) have not been previously reported in dbSNP and thus have been submitted to dbSNP database (http://www.ncbi.nlm.nih.gov/SNP/snp_viewTable.cgi?handle=KAMBOH). Four of 64 novel variants were located in flanking regions, 3 in 5′-UTR, 6 in 3′-UTR and 51 in introns; majority (69%) had MAF<0.01. We identified nine novel indels (all located in introns) ranging in size from 1 to 15 bases. In addition to ‘novel variants’, we identified ‘novel uncommon/rare alleles’ at two nucleotide positions where common diallelic variations were previously reported (rs7002728:G>T and rs28599962:T>C), thus we detected triallelic variations at these positions where the least frequently observed allele was unique to our sample. Of 308 variants, 24 were found only in high HDL-C group vs 54 only in low HDL-C group.

Figure 1
figure 1

(a) Minor allele frequency distribution of LPL variants identified in African Blacks (n=95). (b) Distribution of LPL variants by location in the gene and comparison with size distribution of those locations. (c) Number and locations of LPL variants identified in African Blacks (n=95).

Follow-up genotyping of selected LPL variants in the entire sample

LD and Tagger analyses were performed on 130 common variants (MAF≥0.05) to identify and select the tagSNPs for follow-up genotyping in the entire sample; 92 common tagSNPs were selected using the r2 cut-off of 0.9 (Supplementary Table 2). We also compared our Tagger results with those from the HapMap YRI population. All common tagSNP bins identified in HapMap YRI data for the same genomic region using the same parameters were also captured in our data. In addition to common tagSNPs, we selected the following variants for follow-up genotyping: (i) variants located in exons or intron–exon junctions, (ii) uncommon/rare variants (MAF<0.05) present in ≥2 individuals included in sequencing and (iii) suspicious (borderline quality) novel variants identified in sequencing. Additionally, three known common LPL SNPs that were not detected in our sequencing (see section DNA resequencing results) were also included in follow-up genotyping. In total, 163 variants were selected for follow-up genotyping in addition to 1 variant (rs320:T>G) that was already genotyped as part of an earlier study.16 Of 164 variants (92 tagSNPs, 72 others) advanced into next stage, 30 failed the assay design for genotyping (23 tagSNPs, 7 others) and 2 suspicious variants turned out to be monomorphic after genotyping (indicating sequencing artifacts). Of 132 successfully genotyped variants (Supplementary Table 3), one was triallelic and not included in downstream analyses. As a part of QC, five additional SNPs (two tagSNPs, three others) were excluded from downstream analyses; two had low call rate and three did not meet the HWE. Thus, a total of 126 variants (67 common and 59 uncommon/rare variants based on their frequency in our entire sample) were included in downstream association analyses. The LD bins of these 126 variants are shown in Supplementary Table 4.

Association of common variants (MAF≥0.05) with lipid/lipoprotein levels

Table 3 summarizes the single-site analyses results for 17 common variants that showed nominal associations (P<0.05) with one or more of four lipid traits (HDL-C, LDL-C, ApoA1 or ApoB) in 788 ABs. The results for all 67 common SNPs for all five tested lipid traits are summarized in Supplementary Tables 5. A post hoc power analysis showed that our power to detect a causal SNP would be 78–92%, given the proportion of the variance of the lipid traits explained by these SNPs (ranging from 1.0% to 1.4%) and a significance level of 0.05.

Table 3 Common LPL variants nominally (P<0.05) associated with plasma lipid/lipoprotein levels

The most significant SNP, rs252:delA (intronic variant), was associated with LDL-C (β=−1.037; P=0.002; FDR=0.134) and ApoB (β=−2.364; P=0.012; FDR=0.328) levels. This is a novel association given that the rs252:delA SNP was not in high LD (r2≤0.20) with other nominally associated SNPs (Figure 2), including two functional LPL variants rs1801177:G>A (p.(Asp36Asn)) and rs13702:C>T (resides in 3′-UTR and disrupts a microRNA-410 recognition element seed site).30 One additional SNP, rs74304285:G>A, which again was not highly correlated (r2≤0.11) with other relevant SNPs, was also associated with both LDL-C (β=0.995; P=0.019; FDR= 0.627) and ApoB (β=3.014; P=0.010; FDR=0.328). In addition, four other weakly correlated (r2≤0.20) SNPs (rs1801177:G>A, rs8176337:G>C, rs329:A>G and rs12679834:T>C) were nominally associated with ApoB levels, of which rs12679834:T>C was also associated with ApoA1 levels. Two of these SNPs (rs1801177:G>A and rs8176337:G>C) were previously reported to be associated with TG,31, 32 but showed only 0.05<P<0.20 for TG levels in our study.

Figure 2
figure 2

LD structure of 17 LPL SNPs associated with one or more lipid/lipoprotein traits (HDL-C, LDL-C, ApoA1 or ApoB levels). The values in the cells are the pairwise degree of LD indicated by r2 × 100. Shades of white indicate r2=0, shades of gray indicate 0<r2<1 and shades of black indicate r2=1.

The second most significant SNP, rs316:C>A (synonymous variant), was associated with HDL-C (β=0.68; P=0.003; FDR=0.178) and ApoA1 (β=1.292; P=0.022; FDR=0.288) levels. The rs316:C>A SNP was in LD with four intronic SNPs, including rs279:C>G (P=0.029; FDR=0.233; r2=0.54) and rs295:A>C (P=0.043; FDR=0.291; r2=0.46) associated with HDL-C, and rs301:T>C (P=0.008; FDR=0.212; r2=0.69) and rs320:T>G (P=0.018; FDR=0.212; r2=0.62) associated with HDL-C and ApoA1. The rs320:T>G SNP (HindIII polymorphism) has been well-known and consistently associated with HDL-C and TG levels in several studies.18, 32, 33 Five more SNPs located in 3′-UTR or 3′ flanking region and in LD (r2>0.40) with each other or with above five SNPs, also yielded nominal associations with HDL-C: rs1059507:C>T (β=0.689; P=0.031; FDR=0.233), rs13702:C>T (β=−0.490; P=0.011; FDR=0.212), rs3916027:G>A (β=0.421; P=0.028 FDR=0.233), rs4921683:T>A (β=0.757; P=0.019; FDR=0.212) and rs4921684:C>T (β=0.869; P=0.016; FDR=0.212). Of 10 SNPs associated with HDL-C in our sample, 7 (rs301:T>C, rs316:C>A, rs320:T>G, rs1059507:C>T, rs13702:C>T, rs3916027:G>A and rs4921683:T>A) exhibited suggestive evidence of association with HDL-C also in a recent meta-analysis.34 Six of them (rs301:T>C, rs316:C>A, rs320:T>G, rs1059507:C>T, rs4921683:T>A, rs4921684:C>T) were also nominally associated with ApoA1 in our study. Another widely studied LPL variant, rs328:C>G/p.(Ser474Ter) (p.(Ser447Ter) in mature protein excluding the signal peptide), occurred at lower frequency (4.2%) in our African sample (>10% in Europeans) and its established association with TG and HDL-C in Europeans was not replicated here.

All together, we found eight nominally associated (P<0.05) and not highly correlated (r2<0.40) signals; rs1801177:G>A, rs8176337:G>C, rs74304285:G>A, rs252:delA, rs316:C>A, rs329:A>G, rs12679834:T>C and rs4921684:C>T. To our knowledge, rs74304285:G>A and rs252:delA have not been tested before for association with lipids.

Association of uncommon/rare variants (MAF<0.05) with lipid/lipoprotein levels

For the SKAT-O analysis of 59 uncommon/rare variants with lipid traits, 3 bins were generated using 3 MAF thresholds (bin 1 (59 variants with MAF< 0.05), bin 2 (32 variants with MAF≤0.02) and bin 3 (22 variants with MAF≤0.01)) and test statistics were calculated for each bin separately. Although bin 1 was associated with ApoB (P=0.016), bin 3 showed significant association with both TG (P=0.039) and LDL-C (P=0.027; Table 4). In post hoc single-site analysis of 22 rare variants in bin 3 (Supplementary Table 10), 7 (including 3 novel) showed nominal association with one or more lipid/lipoprotein traits (TG, HDL-C, LDL-C or ApoA1).

Table 4 Results of rare variant association analyses (SKAT-O) of the LPL gene

Haplotype-based association analysis results

A total of 123 overlapping windows, each containing four SNPs, were constructed using the sliding window approach (sliding one SNP at a time) for haplotype analysis of 126 variants with lipid traits (Figure 3). P-values were calculated based on the comparison of each haplotype with the most common reference haplotype. The strongest haplotype effects were observed on HDL-C, followed by ApoB, comprising 19 and 18 nominally significant (P<0.05) global P-values, respectively (Supplementary Table 11). The most significant window (‘window 81’ containing rs313:A>G, rs314:A>G, rs77434393:G>A and rs316:C>A) was associated with both HDL-C (global P=2.32E-04) and ApoA1 (global P=0.021). The only SNP in this window that showed association in single-site analysis was rs316:C>A (P=0.003 with HDL-C; P=0.022 with ApoA1) and thus the observed haplotype association seemed to be primarily driven by this SNP. Although the haplotypes in intron 6–intron 8 region were associated with HDL-C and ApoA1, those in intron 8–intron 9 were associated with ApoB. Although no common LPL SNPs were associated with TG in single-site analysis, seven haplotype windows showed nominal associations with TG. Overall, haplotype analysis was more informative for TG and LDL-C than single-site analysis.

Figure 3
figure 3figure 3figure 3

Haplotype association results for HDL (a), TG (b), LDL-C (b), ApoA1 (d) and ApoB (e) levels. The log of the global P-value is presented on the y axis and SNPs are presented across the x axis in chromosomal order. Horizontal lines are 4-SNP haplotype windows. The red horizontal line shows the significance threshold. A full color version of this figure is available at the European Journal of Human Genetics journal online.

Functional annotation of significant SNPs using RegulomeDB

ENCODE annotations were retrieved for all variants using RegulomeDB35 (Supplementary Table 1). Of 10 of 17 nominally associated SNPs scored in RegulomeDB (Table 3), 2 (rs316:C>A and rs12679834:T>C) were cis-eQTL (expression quantative trait loci) SNPs (score=1f) but the remaining showed minimal evidence for regulatory function (scores=4–6).

Discussion

To our knowledge, this is the first study of its kind that reports a comprehensive catalog of sequence variation in the entire LPL gene in African individuals in relation to plasma lipid traits. Resequencing of 95 ABs with extreme HDL-C levels identified 308 variants, of which 64 were novel. Recently, our group also reported the resequencing of entire LPL gene in 95 non-Hispanic whites (NHWs) with extreme HDL-C levels;36 176 variants were identified in 95 NHWs, of which 113 were also found in 95 ABs. As expected, we observed more variation (especially more uncommon/rare variants) in ABs. Nickerson et al37 sequenced a portion of LPL (9.7 kb) in 71 individuals randomly selected from three populations, including African Americans (AAs) from Jackson, Mississippi (n=24).37 We observed more variants (97 vs 78, of which 71 shared) in this 9.7 kb region in ABs as compared with AAs in Nickerson et al.37 Observed differences could be due to differences in genetic background (ABs vs AAs), sample size (95 vs 24), selection criteria (subjects with extreme lipid profiles vs randomly selected subjects) and/or software tools used for variant analyses (Variant Reporter vs PolyPhred).

Following the discovery stage, 164 variants were selected for follow-up in our entire sample (n=788), of which 126 successfully passed all genotyping steps (assay design, run and post-run QC) and were analyzed for association with lipid traits. Single-site analysis of common SNPs revealed 10 nominal associations (P<0.05) with HDL-C, 3 with LDL-C, 6 with ApoB and 8 with ApoA1 levels. Our observation of a higher number of associations with HLD-C is consistent with the published data.32 In Europeans, several LPL SNPs have been reported to be associated with lipid levels, including six (rs268:A>G, rs326:G>A, rs320:T>G, rs328:C>G, rs1801177:G>A and rs13702:C>T) consistently associated with HDL-C and/or TG.32, 38, 39, 40, 41 Among these, rs268:A>G was absent in our African sample and in HapMap YRI data. The rs326:G>A SNP has also been shown to be associated with HDL-C in AAs.42 Although rs326:G>A showed no association in our sample, it was in LD with HDL-associated SNP rs13702:C>T (P=0.011; FDR=0.212; r2=0.64). The association of rs13702:C>T with HDL-C has also been reported in AAs.43, 44 Although rs13702:C>T was shown to be in LD with rs320:T>G in Europeans,32 the correlation was weak in our African sample (r2=0.27). Replication of the association of rs13702:C>T with HDL-C in multiple African-derived samples and its potential to disrupt a microRNA recognition site30 support its functional role in HDL metabolism in African-descent subjects. Common SNPs in/around LPL have also been reported in several lipid GWAS conducted mostly in European-descent subjects.45, 46, 47 A large GWAS conducted for CHD and risk factors in AAs48 replicated 17 European loci, including two HDL-C associated SNPs (rs10503669:C>A and rs10096633:C>T) located downstream of LPL (outside of our target region).

The rs328:C>G (p.(Ser447Ter)) SNP has been shown to increase LPL activity and cause lower TG and higher HDL-C levels.16, 18, 33, 38 Although its frequency is relatively high in Whites (MAF>10%), it is low in our African sample (4.2%) and in HapMAP YRI population (3.3%). Although the potential functional significance of rs328:C>G is appreciated in Europeans, recent studies in AAs have sparked debate on its role.43, 44 An admixture mapping study of 3300 AAs determined that the effect size of rs328:C>G on TG was dependent on ancestral background and significantly diminished in subjects with African background,43 suggesting that there might be other truly causal undiscovered variant(s) in LD (or acting synergistically) with rs328:C>G. A cross-sectional and longitudinal study found rs328:C>G to be significantly associated with both HDL-C and TG in European Americans but only with TG in AAs.44 A recent large GWAS in AAs failed to show its association with HDL-C or TG.48 Consistent with most reports in AAs, we did not observe association of this SNP with HDL-C or TG in our sample. The HindIII (rs320:T>G) polymorphism has been shown to have the same effects on TG and HDL-C as has rs328:C>G, and there has been a debate over whether their effects are independent.16, 33, 39, 49, 50, 51 The rs320:T>G SNP is predicted to affect the binding of a transcription factor and so may be functional by itself.52 In our data, the association of rs320:T>G with HDL-C (β=0.485; P=0.018, FDR=0.212) was independent of rs328:C>G; however, it was in LD with four other HDL-associated SNPs: rs301:T>C (P=0.008; FDR= 0.212; r2=0.81), rs316:C>A (P=0.003; FDR=0.178; r2=0.62), rs295:A>C (P=0.043; FDR=0.291; r2=0.59) and rs3916027:G>A (P=0.028; FDR=0.233; r2=0.45) that have also been reported to be associated with HDL-C.34, 43, 46 Further studies are needed to understand whether rs320:T>G explains other observed associations or yet to be discovered functional variant(s) exist in this region.

Associations of LPL SNPs with LDL-C have not been well documented in the literature, albeit some contrasting results exist for rs320:T>G.53, 54 We have identified two independent (r2=0) LDL-associated common SNPs, rs252:delA (P=0.002; FDR=0.134) and rs74304285:G>A (P=0.019; FDR= 0.627), as novel observations that warrant further investigation in independent studies. We have also detected a group of 22 rare variants (MAF≤0.01; 11 in UTR, 5 coding and 6 intronic) that showed significant association with LDL-C (P=0.027) and TG (P=0.039), including 4 known nonsynonymous variants with no previous report of association with lipids and 10 novel variants. This association probably represents cumulative effects of functional rare variants; however, further studies are needed to understand the impact of these variants on plasma lipid profile. The results of haplotype and single-site analyses were largely comparable for HDL-C, ApoA1 and ApoB. However, for TG and LDL-C, several haplotype windows (some of which harboring only uncommon/rare variants) yielded significant associations despite the absence of (TG) or only few (LDL) nominal associations in single-site analyses.

To our knowledge, this is the first study that has comprehensively evaluated LPL genetic variation in relation to lipid traits in ABs. Our study provides new information in addition to supporting some previous observations, but has some limitations. Our sequencing sample was small (some relevant rare variants might have been missed) but our selective resequencing of subjects with ‘extreme’ phenotypes still enabled us to identify several rare variants. Given the known role of LPL in lipid metabolism and that several tested SNPs were not completely independent, the strict Bonferroni correction was not applied. After multiple testing correction using FDR, our top two associations (rs252:delA with LDL-C and rs316:C>A with HDL-C) still looked promising. However, given our modest sample size and generally small effect sizes of lipid-associated variants, our new observations warrant further evaluation in independent studies.