Genome-wide association (GWA) studies have discovered multiple common genetic risk variants related to common diseases. It has been proposed that a number of these signals of common polymorphisms are based on synthetic associations that are generated by rare causative variants. We investigated if mutations in the low-density lipoprotein receptor (LDLR) gene causing familial hypercholesterolemia (FH, OMIM #143890) produce such signals. We genotyped 480 254 polymorphisms in 464 FH patients and in 5945 subjects from the general population. A total of 28 polymorphisms located up to 2.4 Mb from the LDLR gene were genome-wide significantly associated with FH (P<10−8). We replicated the 10 top signals in 2189 patients with a clinical diagnosis of FH and in 2157 subjects of a second sample of the general population (P<0.000087). Our findings confirm that rare variants are able to cause synthetic genome-wide significant associations, and that they exert this effect at relatively large distances from the causal mutation.
Genome-wide association (GWA) studies comparing hundreds of thousands of common polymorphisms between persons with and without disease have successfully identified genetic risk variants with frequencies >5%. These genetic variants are expected to tag frequent causal variants, and it is considered unlikely that they are coupled to mutations having frequencies far below 1%.1, 2 However, Dickson et al3 performed intriguing simulation studies, suggesting that stochastically occurring coupling between common polymorphisms and rare causal variants can give rise to synthetic association signals in GWA studies. By exploring two autosomal recessive disorders (hearing loss and sickle cell anemia), they showed that GWA signals occur when multiple rare mutations are coupled to a limited number of common polymorphisms. In addition, mutations in the nucleotide-binding oligomerization domain-containing protein 2 (NOD2) gene produced synthetic association signals in patients with Crohn’s disease, and rare variants of hypertriglyceridemia were identified by a GWA.4, 5 Remarkably, the functional variants can be located at a large distance (2.5 Mb) from the synthetic association signal. The relevance of synthetic signals has been questioned and the examples may be considered anecdotical.6, 7, 8 This is supported by extrapolations from preliminary data of the 1000 Genomes Project showing that associations can be synthetic but that this will hardly occur in case of complex disorders.9, 10 However, these findings are difficult to interpret, because this phase of the project had power to identify variants with a frequency of >1% and because undeniable causal variants were lacking. On the basis of our nationwide molecular screening program for familial hypercholesterolemia (FH, OMIM #143890), the rare causal mutation has been identified in a large population displaying extensive heterogeneity at the low-density lipoprotein receptor (LDLR) gene.11 Hence, we had the opportunity to test whether well-known rare variants of the LDLR gene causing an autosomal dominant disorder create synthetic associations in real data of independent samples.
In the present study, we investigated whether replicable synthetic associations occur based on mutations in the LDLR gene causing FH.
Materials and methods
The ‘Association study of coronary heart disease (CHD) Risk factors in the Genome using an Old-versus-young Setting’ (ARGOS-NL) population consisted of 500 patients, who were selected from 17 000 FH patients with an identified causal mutation in the LDLR gene (segregating in the families and not present in controls) from the nationwide molecular screening program for FH. The scope of the original ARGOS study was to identify previously unknown risk factors for CHD. Phenotypic data were acquired from general practitioners and by reviewing medical records at the lipid and cardiology clinics. We selected the 264 youngest patients with severe premature CHD (defined as myocardial infarction, CBAG and PTCA) and the 236 oldest patients without any sign or symptom of CHD, stratified for sex. At least all first, second and third degree relatives were excluded from the study using the pedigree information. After exclusion of family members and patients, whose DNA was not available, 464 FH subjects were available as cases for the GWA study. The control group consisted of 5945 subjects of the Rotterdam study (RS-I) without excluding FH patients. The Rotterdam Study is a population-based cohort described in detail elsewhere.12 All cases and controls were of apparent Caucasian descent. All subjects gave informed consent and the ethics institutional review board approved the study protocols.
Illumina Infinium HumanHap550K chips (Illumina, San Diego, CA, USA) were used to genotype the subjects of the ARGOS population and the Rotterdam study. Polymorphisms were excluded if they had (i) unsuccessful genotyping in 2% of all subjects, (ii) a minor allele frequency <5% or (iii) showed deviation from Hardy–Weinberg equilibrium (P<0.0001). For each polymorphism that passed the quality control in both populations, we compared allele frequencies between ARGOS and the Rotterdam study using Plink version 1.06.13 Cross-tabulation and stepwise forward regression analysis of genome-wide significant polymorphisms in the LDLR gene region were performed using SPSS version 15.0 (IBM, Armond, NY, USA). After identity-by-descent (IBD) analysis in Plink version 1.06 (Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA, and the Broad Institute of Harvard & MIT, Cambridge, MA, USA), a multidimensional scaling (MDS) plot was made with R package 2.8.1 (University of Auckland, Grafton, New Zealand, Australia). Haplotypes were constructed with haplo.stats in R package 2.8.1.
To replicate our results, we compared the genotype frequencies of the ten most significant polymorphisms between a second cohort of 2189 patients with clinical FH and another population-based control sample consisting of 2157 subjects of the Rotterdam study (RS-II).12, 14 The above-described Illumina chips were used to genotype the subjects of the RS-II. In the second FH cohort, the polymorphisms were genotyped using TaqMan (Applied Biosystems, Foster City, CA, USA). The replication analyses were performed with SPSS version 15.0. All subjects gave informed consent and the ethics institutional review board approved the study protocols.
We compared the frequencies of 480 254 polymorphisms between 464 FH patients (ARGOS-NL) and 5945 subjects from the general population (RS-I). Overall, 28 polymorphisms were genome-wide significantly associated with FH (Supplementary Figure 1). Of these, 13 polymorphisms were located up to 1 Mb upstream or downstream from the LDLR gene (Table 1 and Supplementary Figure 1). In addition, a number of signals were even further away from this gene: 15 polymorphisms were located up to 2.4 Mb from the gene (Table 1). Remarkably, the polymorphisms within the LDLR gene were not associated with FH. Entering the polymorphism in a stepwise forward logistic regression model demonstrated that the rs2304240, rs4804149, rs387865 and rs4804636 polymorphisms were independently associated with FH with P-values ranging from 2.21 × 10−34 to 1.61 × 10−10 (Table 1). The genomic inflation factor was 1.20. Correction for up to 20 principal components slightly decreased the genomic inflation factor (1.19). The largest cluster found by IBD analysis existed of five persons and between these persons the maximum shared IBD was 4%. Adjustment for age, which might be considered as an adjustment for group, did lower the inflation factor to 1.03 but the significant association between the polymorphisms on chromosome 19 and FH yielded similar results.
Replication of results
We compared the genotype frequencies of the 10 most significant polymorphisms between a second FH cohort of 2189 patients and 2157 subjects of the general population (RS-II): all 10 polymorphisms on chromosome 19 were significantly associated with FH (all P<10−4, Supplementary Table 1). An LDLR mutation had been identified in 1258 patients of this second FH cohort. Analyses restricted to these mutation carriers resulted in higher odds ratios with lower P-values, whereas analyzing clinical FH patients in whom no LDLR mutation was detected (n=931) did not result in replication with the exception of rs4804149 (Supplementary Table 2).
Associated polymorphisms do not identify patients with (familial) hypercholesterolemia
As the presence of LDLR mutations produced strong signals in our GWA study, we hypothesized that these signals might identify subjects with FH in the general population. The prevalence of FH is ∼1:500.11 This indicates that 12 out of 5945 subjects of the RS-I and 4 out of 2157 subjects of RS-II were expected to carry an LDLR mutation. However, a MDS plot based on the DNA similarity at the LDLR gene ±1 Mb between subjects from the ARGOS and all controls (RS-I and RS-II) did not show any pattern that distinguished carriers of LDLR mutations from non-carriers.
We constructed four haplotypes of rs2304240, rs4804149, rs387865 and rs4804636 polymorphisms that were independently associated with FH. These haplotypes were associated with having an LDLR mutation when comparing ARGOS to RS-I and RS-II (P<2.61 × 10−5). Within both RS samples the haplotypes were not associated with LDL cholesterol level or possible FH phenotype, defined as total cholesterol level >7 mmol/l and triglyceride level<4 mmol/l, or LDL cholesterol level >6 mmol/l and a history of premature CHD (Supplementary Table 3).
As each LDLR mutation might cosegregate with different alleles of common polymorphisms, we compared the genotype frequencies of the polymorphisms found by GWA between the subjects of the ARGOS study and the RS-I, stratified for all mutations that were present in 10 or more patients. Different mutations indeed appeared to cosegregate with different alleles of the polymorphisms, because of the absence of homozygosity for the wild-type allele. The results of rs3745264 and rs2304240, the two most significant hits of our study, are shown as example in Table 2. The cosegregation of a mutation with specific alleles of common polymorphisms was most pronounced in carriers of the LDLR mutation 191-2 in intron 2 (association with the minor allele in seven polymorphisms), S285L in exon 6 (association with the minor allele in nine polymorphisms) and a large deletion of 2.5 kb from exon 7–8 called the Cape Town-2 mutation (association with the minor alleles in eight polymorphisms). This cosegregation is confirmed by the disappearance of the associations between FH and the two polymorphisms in Table 2 after removing all carriers of these mutations from the analyses: P=0.38 for rs3745264 (191-2, 313+1/2, S285L, G322S and Cape Touwn-2 removed) and P=0.95 for rs2304240 (191-2, 313+1/2, S285L, Cape Town-2, 1359-1 and L590F removed).
We searched for these particular combinations of minor alleles in persons with a possible FH phenotype in the Rotterdam study and identified seven possible FH patients. Sequencing of the LDLR gene including the promoter region and splice sites, as well as the lipid profiles, did not identify FH patients among these subjects.
In this study, we demonstrate that LDLR mutations give rise to multiple independent synthetic associations and that these signals occur up to 2.4 Mb upstream and downstream of the LDLR gene. Using the associated polymorphisms, we were not able to identify FH patients among the general population.
The synthetic associations in our study could have been the result of cryptic family relations or population stratification. The genomic inflation factor (1.20) was relatively high. This indicates that there might be population stratification. Correction for up to 20 principal components only slightly decreased the genomic inflation factor (1.19) and did not change the association between the polymorphisms and FH.15 In addition, our selection procedure of ARGOS aimed at genetic heterogeneity at the LDLR locus: we took a sample from our nationwide screening program and excluded relatives, resulting in identification of 96 different LDLR mutations in 464 FH patients. The most frequent mutation (1359-1 splice defect in intron 9) was found in 14%, which is similar to its frequency in the overall Dutch FH population. To further explore cryptic family relations, we used an IBD analysis. The largest cluster existed of five persons and between these persons the maximum shared IBD was low. Taken together, this confirms that we succeeded in excluding relatives and even distant relationships did not influence our results. Therefore, random inflation by cryptic family relations or population stratification is a very unlikely explanation for the 28 GWA signals at the LDLR locus.
Remarkably, the polymorphisms within the LDLR gene were not associated with FH. Particular mutations associated with specific polymorphism within the LDLR gene, but these numbers were too small to produce a genome-wide significant signal. Alternatively, one could argue that they did not tag the mutations at all because of their location outside the linkage disequilibrium blocks.
We also found a genome-wide significantly associated polymorphism that was not located on chromosome 19 but on chromosome 16, however, genotyping of this polymorphism did not result in distinct genotype clusters, indicating a false-positive result.
As the presence of an LDLR mutation produced signals in our GWA study, we hypothesized that these signals might identify subjects with FH in the general population. An MDS plot based on the DNA similarity at the LDLR gene ±1 Mb between subjects from the ARGOS and the Rotterdam study did not show any pattern that distinguished carriers of LDLR mutations from non-carriers. Alternatively to this MDS analysis, we tried to identify subjects with FH in the Rotterdam study by constructing haplotypes of the independently associated polymorphisms. As expected, some of these haplotypes were associated with having an LDLR mutation, when comparing ARGOS to the Rotterdam study. However, within the Rotterdam study none of the haplotypes were associated with LDL cholesterol level or possible FH phenotype. As both MDS analysis and haplotype analyses were not able to identify FH subjects in the Rotterdam study, we hypothesized that different mutations might cosegregate with different alleles of the polymorphisms. We therefore compared frequencies of the genotypes of the significantly associated polymorphisms between carriers of the mutations in ARGOS that were present in 10 or more patients. Different mutations indeed appeared to cosegregate with different alleles of the polymorphisms. The significance of these associations was most likely driven by the fact that specific mutations cosegregated with the minor allele of the polymorphism, making it the common allele among carriers of these mutations. For some of the more frequent mutations, cosegregation with the rare allele of a polymorphism at the LDLR locus was even more certain, as homozygosity for the common allele of the polymorphism was absent among the carriers of the mutation, however, using this information did not lead to identification of FH patients in the general population. Different mutations cosegregated with different alleles, but of course the power to detect specific LDLR mutations in the general population was limited.
Our GWA study picked up genetic heterogeneity at a locus of an exemplary Mendelian trait, supporting that multiple rare variants produce synthetic GWA signals. This suggests that we have opportunities to detect rare variants in a GWA. Our study cannot be used to estimate what proportion of heritabilities will be explained by rare variants.16 We merely show that if rare undeniable causal variants are present strong signals may appear. A very large population (∼250 000 persons) is required to identify synthetic signals of LDLR mutations with similar power in a population-based setting. Sequencing of large populations is still too expensive. Perhaps, we should enrich the current sequencing initiatives with our cohorts containing monogenic disorders to gain more in depth insight in the behavior of mutated loci. The phenomenon of synthetic associations were first described by Dickson et al.3 Recently, Anderson et al7 suggested that the consistency of GWA results in different populations is an argument against involvement of synthetic associations. We clearly replicated the signals in an independent sample in line with consistency among populations, but the cases and controls were again Caucasian. It would be of great interest to know whether our results would be consistent in populations of other ethnicities. The distributions of the LDLR mutations differ between populations and probably the polymorphisms of the synthetic association signals as well. Unfortunately, cohorts of genetically heterogeneous FH patients from other populations are currently not large enough for this analysis.
Wray et al6 argued that the allele frequencies of the signals in GWA’s are too high to be explained by rare variants. In the initial and the replication phase, however, we found that a large number of different mutations in the LDLR produced a wide range in minor allele frequencies (MAFs) of all found polymorphisms including the independently associated polymorphism (rs2304240, rs4804149, rs387865, rs4804636). Although overall the MAFs varied over a wide range, there might be a trend of higher MAFs of the polymorphisms located <1 Mb from the LDLR gene and lower MAFs of the polymorphisms located further away.
We confirm that mutations causing a monogenic disorder give rise to multiple independent synthetic associations and that these signals occur up to 2.4 Mb away from the rare functional variants. However, we could not simplify molecular diagnostics of FH with these synthetic associations.
About this article
Supplementary Information accompanies the paper on European Journal of Human Genetics website (http://www.nature.com/ejhg)