Individuals with exceptional longevity and their offspring have significantly larger high-density lipoprotein concentrations (HDL-C) particle sizes due to the increased homozygosity for the I405V variant in the cholesteryl ester transfer protein (CETP) gene. In this study, we investigate the association of CETP and HDL-C further to identify novel, independent CETP variants associated with HDL-C in humans.
We performed a meta-analysis of HDL-C within the CETP region using 59,432 individuals imputed with 1000 Genomes data. We performed replication in an independent sample of 47,866 individuals and validation was done by Sanger sequencing.
The meta-analysis of HDL-C within the CETP region identified five independent variants, including an exonic variant and a common intronic insertion. We replicated these 5 variants significantly in an independent sample of 47,866 individuals. Sanger sequencing of the insertion within a single family confirmed segregation of this variant. The strongest reported association between HDL-C and CETP variants, was rs3764261; however, after conditioning on the five novel variants we identified the support for rs3764261 was highly reduced (βunadjusted=3.179 mg/dl (P value=5.25×10−509), βadjusted=0.859 mg/dl (P value=9.51×10−25)), and this finding suggests that these five novel variants may partly explain the association of CETP with HDL-C. Indeed, three of the five novel variants (rs34065661, rs5817082, rs7499892) are independent of rs3764261.
The causal variants in CETP that account for the association with HDL-C remain unknown. We used studies imputed to the 1000 Genomes reference panel for fine mapping of the CETP region. We identified and validated five variants within this region that may partly account for the association of the known variant (rs3764261), as well as other sources of genetic contribution to HDL-C.
Aging is characterized by a deterioration in the maintenance of homeostatic processes over time, leading to functional decline and increased risk for disease and death.1 One of the genes linked to healthy aging and longevity is the cholesteryl ester transfer protein (CETP) gene.1,2 Homozygosity in the 405VV variants of CETP is associated with lower concentrations of CETP, higher concentrations of high-density lipoprotein concentrations (HDL-C), and greater HDL-C particle size, all associated with both protection against cardiovascular disease3 and exceptional longevity.4
Functional analyses in mice,5 hamsters,6 and rabbits7 have revealed that the protein encoded by the CETP gene mediates the transfer of cholesteryl esters from HDL-C to other lipoproteins such as atherogenic (V)LDL particle and is a key participant in the reverse transport of cholesterol from the periphery to the liver.8 Due to the function of CETP and the association of the gene with HDL-C in humans,9,10 the CETP gene is one of the targets for drug development for dyslipidemia.6,11,12 CETP-inhibition leads to an increase of HDL-C from 30 up to 140% depending on the compound used. The first drug of its class, Torcetrapib was unfortunately associated with an increased mortality and morbidity in patients receiving the CETP inhibitor in addition to atorvastatin.13,14
The estimated heritability of HDL-C levels is high in humans: 47–76%.15,
To this end, we used a meta-analysis of association studies with imputed genotypes within the CETP region. Our study consisted of data from 59,432 samples, of which the genotypes were imputed to the 1000 Genomes project reference panel (version Phase 1 integrated release v3, April 2012, all populations). By using 1000 Genomes imputed data, we expected to find more rare or low-frequent variants, as well as novel insertions and deletions.
Materials and Methods
The descriptions of the participating cohorts can be found in the Supplementary Material. All studies were performed with the approval of the local medical ethics committees, and written informed consent was obtained from all participants.
Study samples and phenotypes
The total number of individuals in the discovery phase was 59,432 and in the replication phase 47,866. Of the discovery samples, 44,108 individuals (74.21%) were of European ancestry. Of the replication samples, 47,081 individuals (98.36%) were of European ancestry. A summary of the details of both the discovery and replication cohorts participating in this study can be found in Supplementary Table 1.
Genotyping and imputations
All cohorts were genotyped using commercially available Affymetrix or Illumina genotyping arrays, or custom Perlegen arrays. Quality control was performed independently for each study. To facilitate meta-analysis and replication, each discovery and replication cohort performed genotype imputation using IMPUTE229 or Minimac30 with reference to the 1000 Genomes project reference panel. The details per cohort can be found in Supplementary Table 2.
Association analysis in discovery cohorts
The lipid measurements were adjusted for sex, age, and age2 in all cohorts, and if necessary also for cohort-specific covariates (Supplementary Table 1). Some cohorts included samples using lipid-lowering medication; we did not adjust for lipid-lowering medication in our analysis because HDL-C levels are only minimally influenced by lipid-lowering medication. Each discovery cohort ran association analysis for all variants within the CETP region (chromosome 16, 56.99–57.02 Mbp) with HDL-C.
Meta-analysis of discovery cohorts
The association results of all discovery cohorts for all variants within the CETP region (chromosome 16, 56.99–57.02 Mbp) were combined using inverse-variance weighting as applied by METAL.31 This tool also applies genomic control by automatically correcting the test statistics to account for small amounts of population stratification or unaccounted relatedness and the tool also allows for heterogeneity. We used the following filters for the variants: 0.3<R2 (measurement for the imputation quality)<1.0 and expected minor allele count (expMAC=2×MAF×R2×sample size)>10 prior to meta-analysis. After meta-analysis of all available variants, we excluded the variants that were not present in at least three cohorts, to prevent false positive findings.
Selection of independent variants
To select only variants that were independently associated with HDL-C, we used the Genome-wide Complex Trait Analysis (GCTA) tool, version 1.13.32 Although this tool currently supports multiple functionalities, we only used the functions for conditional and joint genome-wide association analysis. This function performs a stepwise selection procedure to select independent single nucleotide polymorphisms (SNP) associations by a conditional and joint analysis approach. It utilizes summary-level statistics from the meta-analysis and linkage disequilibrium (LD) corrections between SNPs are estimated from the 1000 Genomes (1000G Phase I Integrated Release Version 22 Haplotypes (2010–11 data freeze, 14 February 2012 haplotypes)). GCTA estimates the effective sample size and determines the effect size, the s.e., and the P value from a joint analysis of all the selected SNPs. In this way, we select the best associated variants in CETP. We subsequently checked whether these variants were in LD within the 1000 Genomes reference panel using PLINK33 software (Supplementary Table 3).
Replication of independent CETP variants
Five variants were selected for replication in a sample of 12 independent cohorts: Athero-Express, CHS, FINCAVAS, LBC1936, Lifelines, LLS, NTR-NESDA, PREVEND, PROSPER, QIMR, TRAILS, and YFS. The lipid measurements were adjusted for sex, age, and age2 in all cohorts, and if necessary also for cohort-specific covariates (Supplementary Table 1b). The details per cohort regarding variant genotyping and imputations can be found in Supplementary Table 2. The association results of all replication cohorts were combined and the s.e.-based weights were calculated by METAL.31 Since none of the five variants are in LD (Supplementary Table 3), the Bonferroni-corrected P value for multiple testing was 0.01.
Test previous published results
The meta-analysis of HDL-C as published by Teslovich et al.9 identified 38 genome-wide significant (P value<5×10−8) variants within the CETP region (chromosome 16, 56.99–57.02 Mbp). Within all discovery and replication cohorts, we tested these 38 variants, adjusting for the 5 newly identified independent variants to explore whether the new variants explain previously published results. The association results of all cohorts were combined and the s.e.-based weights were calculated by METAL.31
We used the genotypes of all 1,092 individuals of the 1000 Genomes project to calculate the correlation between the 38 variants. This correlation matrix was used by matSpDlite34 which examines the ratio of observed eigenvalue variance to its theoretical maximum to determine the number of independent variables. For these 38 genome-wide significant variants within the CETP region, the effective number of independent variables is 18 and therefore the experiment-wide significance threshold required to keep type I error rate at 5% is 2.85×10−3.
Conditional analysis of independent CETP variants
The replicated independent variants were selected for conditional analysis in both the discovery and the replication cohorts. In this analysis we adjusted for the lead SNP for this region as reported by Teslovich et al.9 (rs3764261, chromosome 16, position 56,993,324 bp). The association results of all discovery and replication cohorts were combined and the s.e. based weights were calculated by METAL.31 The Bonferroni-corrected P value for multiple testing was 0.01, since none of the five variants is in LD (Supplementary Table 3).
Validation of the new CETP insertion within a family
Within the ERF study, 3,658 individuals have been genotyped on various Illumina (Illumina, San Diego, CA, USA) and Affymetrix chips (Affymetrix, Santa Clara, CA, USA), followed by imputations with MaCH (1.0.18c) and Minimac (minimac-β-14 March 2012) to the 1000 Genomes reference panel. Based on the best guess imputed genotypes, we selected one family in which we expected the insertion to segregate.
Validation of the insertion was performed by Sanger sequencing. Genomic DNA was isolated from peripheral blood using standard protocols (salting-out). The intron 2–3 of the CETP gene (Supplementary Table 4) was amplified using PCR and the following primer sequences were used to amplify: forward; 5ʹ-tgggggactcaggtctctcc-3ʹ; reverse; 5ʹ-aaagcacctggcccacaacc-3ʹ; size 409 bp.
PCR reactions was performed in 17.5 μl containing 37.5 ng DNA, 10 pmol/μl of each primer, 2.5 mM dNTPs, 10x PCR buffer with Mg+ (Roche) and 5 U/μl FastStart Taq (Roche Nederland B.V., Woerden, the Netherlands). Cycle conditions: 7 min at 94 °C; 10 cycles of 30-s denaturation at 94 °C, 30 s annealing at 70 –1 °C per cycle and 90-s extension at 72 °C; followed by 20 cycles of 30-s denaturation at 94 °C, 30 s at 60 °C, and 90 s at 72 °C; final extension 10 min at 72 °C. Sephadex G50 (Amersham Biosciences) was used to purify the sequenced PCR products. Direct sequencing of both strands was performed using Big Dye Terminator chemistry version 4 (Applied Biosystems, Bleiswijk, the Netherlands). Fragments were loaded on an ABI3100 automated sequencer and analyzed with DNA Sequencing Analysis (version 5.3) and SeqScape (version 2.6) software (Applied Biosystems). All sequence variants are numbered at the nucleotide levels according to the following references: NC_000016.10:g.56963437_56963438insA (NCBI), NM_000078.2:c.233+313_233+314insA, Human Feb. 2009 (GRCh37/hg19) Assembly.
Meta-analysis in all discovery cohorts to select independent variants
The association of all variants within the CETP region (chromosome 16, 56.99–57.02 Mbp) to HDL-C was tested in all discovery cohorts. These results were combined using the inverse-variance weights as applied by METAL.31 After exclusion of the variants that were not present in at least 3 cohorts, 254 variants remained (Figure 1). A conditional and joint analysis of the 254 variants using GCTA identified 5 independent variants (Figure 2). Three variants were intronic (rs5817082, rs4587963, and rs7499892), one variant was intergenic (rs12920974) and one variant was exonic (rs34065661) (Table 1). Using PLINK software,33 we calculated the LD between the five variants based on the 1000 Genomes reference panel, and found that none are in high LD with each other (Supplementary Table 3).
Replication of the independent CETP variants
The five independent variants within the CETP region were selected for replication within the following cohorts: Athero-Express, CHS, FINCAVAS, LBC1936, Lifelines, LLS, NTR-NESDA, PREVEND, PROSPER, QIMR, TRAILS, and YFS. Five variants were replicated at a P value of 2.99×10−34 (Figure 3 and Table 2).
Test to explain the previously published results
In each discovery and replication cohort, we tested if the five independent variants explain the associations within the CETP region (chromosome 16, 56.99–57.02 Mbp) as reported in the study by Teslovich et al.9 We tested a total of 38 genome-wide significant (P value<5×10−8) SNPs within this region identified by Teslovich et al.9 and conditioned for the five independent variants in all discovery and replication cohorts. All 38 variants were significantly (P value corrected for multiple testing<2.85×10−3) associated with HDL-C in our joint analyses without adjusting for the 5 independent variants we identified in this work, and 37 (97.37%) were genome-wide significant (P value<5×10−8) despite the fact that our sample size is about 65% of the study by Teslovich et al.9 (Table 3). When conditioning on the 5 variants identified in this work, 27 (71.05%) variants remained significant (P value<2.85×10−3), though the P values were markedly reduced (Table 3). This finding suggests that the new variants we identified may explain in part the previously reported association. Remarkably, the P value of rs3764261 which was reported as the lead SNP for this CETP region by Teslovich et al.9 was highly reduced from 5.25×10−509 to 9.51×10−25 while the β decreased from 3.179 mg/dl to 0.859 mg/dl. This variant is not in LD with any of the five new variants. Due to the lack of LD, the s.e. of rs3764261 does not change much (s.e.unadj=0.066, s.e.adj=0.084), but the effect of rs3764261 does (βunadj=3.179, βadj=0.859) and therefore the χ2 decreases as well, and that results in a higher P value. This indicates that a part of the effect of rs3764261 can be explained by the effect of the five new variants.
Conditional analysis of the independent CETP variants
Next, we performed conditional analysis of the independent variants in both the discovery and replication cohorts. We conditioned on the lead SNP for the CETP region as reported by the study by Teslovich et al.9 (rs3764261, chromosome 16, position 56,993,324 bp), see Table 4 and Figure 4. This analysis showed that three out of the five variants (rs34065661, rs5817082, rs7499892) are independent of rs3764261. For all variants the P values and β’s decreased, but all P values remained significant. The effect of the single variant rs34065661, of the insertion rs5817082, and of the single variant rs7499892 were reduced by 53.20%, 38.48%, and 32.67%, respectively.
Validation of the insertion within a family
We selected based on the best guess imputations of the ERF study, a large family of 30 individuals for Sanger sequencing of rs5817082. Using MERLIN35 we estimated that the total heritability of HDL-C within this family is 27.47%. DNA was available for 16 individuals. Figure 5 shows the results of the Sanger sequencing for rs5817082 for these 16 individuals within the family. The sequencing of the insertion confirmed the best guess results for 10 individuals (62.5%), of which 7 were heterozygous for the insertion, 1 was homozygous for the insertion, and 2 did not carry the insertion. Three individuals that are homozygous for the insertion, were predicted to be heterozygous by the best guess imputations. Three individuals that are heterozygous for the insertion were not predicted to carry the insertion by the best guess imputations. Furthermore, the Sanger sequencing showed that the insertion segregates with the outcome within this family. The proportion of variance explained by the insertion within this family is 35.50%, while the proportion explained by rs3764261, the lead SNP within the CETP region as reported by the study by Teslovich et al.9 is 14.11%.
We conducted an analysis to fine map the association between CETP genetic variants and HDL-C. To this end, a total of 59,432 samples were imputed to the latest version of the 1000 Genomes (version Phase 1 integrated release v3, April 2012, all populations). We identified and replicated five independent variants within the CETP region (chromosome 16, 56.99–57.02 Mbp), of which four are SNPs and one is an insertion. We validated the insertion by Sanger sequencing within a large family, as the largest effect on HDL-C comes from this insertion.
The relationship between the CETP gene and HDL-C has been known for a long time9 and genome-wide association studies have revealed many common and rare variants in this region. Although the associated genetic variants are strongly correlated with HDL-C, the causal variants have not been determined. Our study showed that when using the latest 1000 Genomes reference panel, we have more power to fine map this association. By conditional analysis of the five variants, we were able to reduce the P values of the genome-wide significant associations published before by Teslovich et al.9 Furthermore, conditional analysis showed that three out of the five variants are independent of the lead SNP for the CETP region as reported by the study by Teslovich et al.9 (rs3764261).
Several fine-mapping effort have been previously published36,37 and in all those efforts sequencing was used for the fine mapping. In our project we did not use sequencing, but imputations using the 1000 Genomes as a reference panel. This method has been widely used in the past and is much lower in cost. With new reference panels available, we were able to have a revised study of this region. The 1000 Genomes reference panel consists of 30 million variants including a million insertions and deletions. By using this reference panel for imputation, we were able to impute these insertions and deletions in 59,432 samples from various cohorts. This led to the significant association of an insertion within a known region with HDL-C. So far, no association between a structural variation and HDL-C has been found in such a large sample size. Validation of the insertion by Sanger sequencing confirms the correct imputations of this insertion in 62.5% of the individuals, of which seven heterozygous carriers, one homozygous carrier and two did not carry the insertion.
The results of this study showed that by using the 1000 Genomes reference panel, the proportion of the variance explained can be increased and that multiple common variants in the same region may be implicated in a single family of the ERF study. The insertion we identified in this study explains 35.50% of variation in the HDL-C level in a single family of the ERF study; this is in concordance with the results of the whole-genome sequence data.23 This is much higher than the proportion of the variance explained (14.11%) in the same family by rs3764261, which was reported before as the lead variant of this region. Fine mapping of various associations may help us to unravel the genetic background of various phenotypes.
Although rs3764261 was identified by Teslovich et al.9 to be the lead SNP of this region, other variants are used in clinical settings. Three of the classical variants are located in the promoter region of the CETP gene: −1337C/T (rs708272 or Taq1B), −971G/A, and −629C/A (rs1800775) polymorphisms.38 Carriers of the B2 allele of the common Taq1B polymorphism exhibit lower plasma CETP levels and higher HDL-C. Furthermore, a recent meta-analysis showed that the B2 allele is associated with a reduced risk for coronary heart disease.39 One more classical variant is rs5882A (405I/V), which is located outside the promoter region.40 The −1337C/T and −629C/A are in strong LD, however, they are in very low LD (r2 of 0.442 for rs708272 and 0.461 for rs1800775) with rs3764261, despite the fact that all three variant are within 3,000 bp of each other.
Large HDL-C particle sizes have been associated with exceptional longevity before and with an increased homozygosity for the I405V variant within the CETP gene.1,
Some genetic variants identified in our study were published before,41,42 but so far no conditional analyses have been performed with these variants. Our study suggests that various CETP variants may be relevant for HDL-levels in the blood circulation and that these may have a substantial role in the heritability of HDL-C in specific families.
About this article
Cardiovascular Research (2018)