Introduction

High levels of circulating total cholesterol and low-density lipoprotein (LDL) cholesterol in the blood are major causes of cardiovascular diseases.1 Blood cholesterol levels are determined by both genetic and lifestyle factors. Over the past decade, genome-wide association studies (GWAS) have identified more than 150 genetic loci associated with blood cholesterol levels, explaining ~13–15% of the variation of these traits.2 However, a large majority of the previous studies on genetic contribution to cholesterol levels considered only single nucleotide polymorphisms (SNPs), which can miss the influence of structural variation in the human genome on cholesterol levels.2

Haptoglobin is a highly abundant protein in the blood that binds free hemoglobin to prevent iron loss and oxidative tissue damage during hemolysis.3, 4 A common deletion of 1.7 kb (hg19, chr16:72,090,310-72,092,029) was identified in exons 3 and 4 of the HP gene, resulting in a variant allele, HP1, which encodes a protein product that forms a dimer in circulation. Individuals who are homozygous for the HP2 allele (without the deletion) have haptoglobin multimers in circulation.4, 5 Each of the HP1 and HP2 alleles can be further differentiated by nucleotide polymorphisms that cause haptoglobin to run faster or slower during gel electrophoresis,6, 7 referred to as ‘F’ and ‘S’, respectively.5 Because of the complex genetic structure of the HP gene, it has been challenging to investigate the HP genetic polymorphisms, particularly the copy number variant (CNV) defining the HP1 and HP2 alleles, in relation to human traits.

Recently, Boettger et al.5 reported a new method to analyze haptoglobin polymorphisms in a European-ancestry population by creating a reference panel to impute the HP1/HP2 CNV. They also found the HP1 allele was associated with lower LDL and total cholesterol levels. To evaluate the generalizability of this method and finding, we investigated the associations of HP gene polymorphisms with blood cholesterol levels in Chinese women.

Materials and methods

Study populations and genetic data

Subjects for the current analysis were control participants in the Shanghai Breast Cancer Study (SBCS), a population-based case-control study, and participants of the Shanghai Women’s Health Study (SWHS), a population-based cohort study, both conducted in urban Shanghai. The SBCS is described in detail elsewhere.8, 9 Control subjects were recruited from permanent residents of Shanghai and a peripheral blood sample was obtained from 74% participants. The age range of participants was 20–70 years with an average age of 49.9 years. The SWHS is an ongoing study of ~75 000 women recruited between 1997 and 2000, between 40 and 70 years of age at recruitment and a permanent resident of one of seven communities in urban Shanghai. In-person interviews, anthropometrics and biological sample collection were carried out by trained interviewers. Approximately 75% of study participants provided a blood sample to the study. Details of the SWHS design and study implementation have been described elsewhere.10 Participants of these studies provided written informed consent, and the Institutional Review Boards of all participating institutions approved the study protocols.

In both SBCS and SWHS, peripheral blood samples were collected from study participants during the in-person interview into EDTA-containing BD Vacutainer tubes, which were temperature controlled during transportation by blue ice packs and processed within 6 h of the blood draw. The samples were stored at –80 °C before biomarker assessment. The lipid profiles were measured using an ACE Clinical Chemistry System following the manufacture’s protocol. For subjects whose triglyceride levels were under 400 mg dl–1, LDL cholesterol levels were calculated using the Friedwald equation, LDL=TC–HDL–(TG/5); for all others, the LDL levels were directly measured. Genomic DNA extracted from buffy coats was genotyped in previous GWASs using primarily the Affymetrix Genome-Wide Human SNP Array 6.0 (Affy6.0) or Illumina Human660W BeadChip. Details of the GWAS methodology, including genotyping protocol, filtering and data cleaning procedures have been previously described.9, 11, 12 Included in this study were 3,608 participants from the SBCS (N=1875) and SWHS (N=1733) whose lipid and genetic data were generated in previous studies. Among the 3608 subjects, we have fasting information for 2433 participants, with 44.55% samples being fasting and 55.45% non-fasting. Selected characteristics of the study population are shown in Table 1.

Table 1 Selected characteristics of study subjects, Shanghai Breast Cancer Study and the Shanghai Women’s Health Study

Haptoglobin CNV imputation

The HP genetic structural variations were imputed within each GWAS dataset using Beagle (v.3.2.2), which uses localized haplotype cluster models for efficient and accurate imputation.13 A phased reference panel developed by Boettger et al. was used to impute the HP alleles.14, 15 In brief, the phased reference panel was constructed using the HP structural data and SNP data for the subjects included in the1000 Genomes Project. The HP structural data were generated based on droplet digital PCR. The SNP data were generated using the Omni 2.5 SNP array. SNP located in~2 Mb region (hg19, chr16:71,088,193–73,097,663), flanking 1 Mb of the HP CNV (hg19, chr16:72,090,310-72,092,029) were used to build the reference. SNPs within the HP CNV region were excluded. Two references panels were built, the European and African panels. For the European reference panel, data for 1277 genetic markers in 548 individuals from the CEU, TSI and IBS populations were included. For the African reference panel, data for 1276 genetic markers in 198 individuals from the YRI population were included. The genetic structure of East Asians is more similar with Europeans than with Africans. Therefore, in the present study, we used the European reference panel to impute the data. From our GWAS data sets, we extracted genotype data for SNPs located within a 1 Mb flanking region of the CNV (hg19, chr16:71,070,878-73,097,663), excluding those within the HP CNV region. The HP CNV alleles (HP1 or HP2) were then imputed to the reference panel using Beagle (v.3.2.2)13 within each data set. Alleles HP1S and HP1F within the HP1, and HP2FS and HP2SS within the HP2, were also imputed by running Beagle.5 Haplotypes surrounding the HP, including HP1Ahap (for HP1S), hapHP1B (for HP1F), HP2Ahap (for HP2FS), and HP2Bhap (HP2SS) were extracted from the phased data in Beagle as described in Boettger et al. (2016).5 On basis of these data, we derived the HP genotypes, that is, HP1–HP1 (HP1F–HP1F, HP1F–HP1S, HP1S–HP1S), HP1–HP2 (HP1F–HP2FS, HP1S–HP2FS, HP1S–HP2SS) and HP2–HP2 (HP2FS–HP2FS and HP2FS–HP2SS). Participants of from the SBCS and SWHS have a similar genotype distribution.

Statistical analyses

Using geometric mean differences and P values derived from multiple linear regression models, we assessed associations between HP polymorphisms and blood cholesterol levels, including total cholesterol, LDL cholesterol, high-density lipoprotein (HDL) cholesterol and triglycerides. The linear regression models were run on log-transformed blood lipid data and adjusted for age and data source.

Results

The imputation quality in our study was high for all of the HP alleles for both data sets (r2 ranged from 0.84 to 0.92; Supplementary Table 1) except for the HP2SS allele, which has a near zero frequency in our study population (Table 2). Haplotypes were composed of 19 SNPs flanking 20 kb of the deletion and three major haplotypes, A, B and C, were observed (Supplementary Table 2). The HP1F allele was primarily observed on haplotype B background, while HP1S allele was observed almost exclusively on haplotype A background. The HP2FS allele was observed on the backgrounds of both haplotypes A and B. When comparing the genotype frequencies of populations of European descent to Chinese populations, the frequencies for HP1–HP1, HP1–HP2 and HP2–HP2 genotypes were similar (Table 2). However, the HP1S allele is much more common in the Chinese population (30.64%) than in the European population (22.69%) and the HP1F allele is less common in Chinese (4.06%) than in Europeans (13.87%) (Table 2). The HP2SS allele frequency is 2.94% of the European population and 0.02% in the Chinese population (Table 2). In the present study, both the HP1F and HP1S alleles are 100% linked to the G allele of rs2000999, a SNP identified in a previous GWAS to be associated with cholesterol levels, while the HP2FS allele was observed on the background of both alleles of the SNP rs2000999, with the frequencies being 43.02% for the HP2FS-G haplotype and 22.27% for the HP2FS-A haplotype (Supplementary Table 2).

Table 2 HP gene allele and genotype frequency distribution in European descendants and Chinese

Given the small sample size for some of the allele sub-groups, the association analysis was performed primarily for HP1HP1 and HP1HP2 genotypes compared with HP2HP2, the most common genotype (freq.=43.05%; Table 2).

The HP1HP1 genotype was significantly associated with lower total cholesterol and LDL cholesterol levels. Compared to women with the HP2HP2 genotype, individuals homozygous for the HP1 allele, on average, had a 4.24 mg dl–1 lower level of total cholesterol (P=0.02) and a 3.43 mg dl–1 lower level of LDL cholesterol (P=0.03) (Table 3). Furthermore, the HP1S–HP1S genotype, the most common HP1HP1 genotype (Table 2), was more strongly associated with a reduced level of total cholesterol and LDL cholesterol, with a mean difference of –5.59 mg dl–1 for total cholesterol (P=7.0 × 10–3) and –4.68 mg dl–1 for LDL cholesterol (P=8.0 × 10−3) when compared to the HP2HP2 genotype (Table 3).

Table 3 Association of HP genotypes with blood lipid levels in Chinese

We also found that the G allele of SNP rs2000999 was associated with decreased cholesterol, with a mean difference of −2.25 mg dl–1 for total cholesterol (P=0.02) and −1.60 mg dl–1 for LDL cholesterol (P=0.06) per G allele compared to the A/A genotype (Supplementary Table 3). These associations were no longer statistically significant after adjusting for the HP1/2 variant, while the associations for the HP1S–HP1S genotype remained statistically significant for total cholesterol (P=0.04) and marginally significant for LDL cholesterol (P=0.06) after adjusting for rs2000999.

A significant association of the HP1HP2 genotype with lower triglyceride levels was also observed; women carrying the HP1HP2 genotype had a –6.51 mg dl–1 lower triglyceride level than those homozygous for the HP2 allele (P=0.01; Table 3). We did not find any significant association of HP alleles with HDL cholesterol levels.

Discussion

In this study, we found that the HP1HP1 genotype, especially HP1S–HP1S, was significantly associated with lower total and LDL cholesterol levels in Chinese women. This result is consistent with the finding of Boettger et al.5, showing an inverse association of the HP1 allele with total and LDL cholesterol in individuals of European ancestry. Our study provides additional evidence supporting an important role of haptoglobin in regulating blood cholesterol levels. Furthermore, it supports the validity of using SNP haplotype imputation to analyze HP structure variations.

The HP deletion is in low linkage disequilibrium with rs2000999, a cholesterol level associated SNP identified in a previous GWAS with r2=0.15. However, the cholesterol lowering allele HP1 is 100% linked to the G allele of rs2000999. Because a high proportion of the cholesterol increasing allele HP2 is also linked to the cholesterol decreasing G allele of SNP rs2000999, SNP rs2000999 cannot capture entirely the association between HP alleles and blood cholesterol, which may explain our observation of a significant association of the HP1S–HP1S genotype with total and LDL cholesterol even after adjusting for SNP rs2000999.

While the most well-known function of haptoglobin is to bind free hemoglobin, it also interacts with apolipoprotein E (ApoE),5, 16, 17 a plasma protein that participates in the removal of chylomicron and very LDL remnants in blood as a ligand for LDL receptors, LDL receptor-related protein 1 and cell surface heparan sulfate proteoglycans.18, 19 Oxidation decreases the ability of ApoE to bind lipoproteins, impairing its ability to clear plasma lipids from the blood.20 Moreover, oxidation also induces a conformational change in ApoE, leading ApoE to become unstable and prone to deposit on blood vessel walls.20 When haptoglobin binds to ApoE, it protects ApoE from oxidative damage.16 Proteins encoded by the HP1 allele have greater antioxidant capacity than HP2-encoded proteins,21 providing greater antioxidant protection when bound to ApoE proteins, potentially allowing ApoE proteins to more efficiently remove lipoproteins in blood.

We also observed a significant association of HP1HP2 genotypes with lower triglyceride levels in this study. This was consistent with a previous report of a reduced triglyceride level in HP1 allele carriers compared to HP2 carriers.5 However, the long tail in triglyceride density distribution shown in Supplementary Figure 1 indicates there were outliers with much higher triglyceride levels that may have affected the analysis of triglyceride levels with the HP CNV. The absence of a similar association with HP1HP1 genotypes also suggests that the HP1HP2 genotype association observed in our study might represent a false association due to chances. Thus this finding requires further confirmation.

There are some limitations in the present study. First, due to the low frequency of HP1F and HP2SS, we were unable to derive stable estimations for genotypes HP1F–HP1F and HP2SS–HP2SS. Therefore, we could not perform additional association analyzes for these genotypes. It would be ideal to technically validate the imputed genotypes by the ddPCR technique. However, we were constrained by a lack of proper equipment and a limited budget. Nevertheless, any imputation error, if exists, is most likely to be differential and would lead an underestimation of the true effect. Second, only about half of the plasma samples included in our study are fasting samples, and other are non-fasting samples. The latter could have compromised our ability to evaluate the association of HP polymorphisms with circulating triglyceride levels. In addition, the effect sizes observed in the present study are small and a large sample size is needed to robustly validate these results. Finally, information on the use of lipid lowering medications, especially statins, was unavailable in our study. However, our participants were recruited between 1997 and 2000 when the use of statins and other lipid lowering medications was uncommon in China. Inability to adjust for use of lipid lowering medication in our analysis could reduce the statistical power of the study.

In summary, through SNP haplotype imputation, we extended the findings of associations of HP1/2 CNV with total and LDL cholesterol levels reported previously in European descendants to East Asian populations, providing further support of a significant role of haptoglobin in blood cholesterol levels. Our study also extends the validity of the novel HP imputation method developed by Boettger et al. to individuals of Chinese ancestry, which could be used in future studies to use densely genotyped data to efficiently study potential associations of haptoglobin polymorphisms with complex human phenotypes in populations of Asian ancestry.