Introduction

We recently reported the results of a genome-wide linkage scan for premature atherosclerotic coronary artery disease (CAD) based on 4175 affected subjects from 1933 families.1 Despite the large size of the study, the strongest result was suggestive evidence for linkage on chromosome 2, with a multipoint non-parametric logarithm of odds (LOD) score of 1.98 in an initial scan (pointwise P=0.0014, genome-wide P=0.31, based on observing at least one LOD score of 1.98 or higher in 310 out of 1000 simulations) falling to 1.86 with the addition of extra markers in the region. Failure to find evidence of significant linkage at the genome-wide level is common in the analysis of complex diseases. This may be because the disease is influenced by interactions between distinct genetic loci, genetic heterogeneity or both.2, 3 Genetic heterogeneity may explain why, among the analyses showing suggestive or significant linkage, a relatively small proportion has so far been replicated, even when estimates of sib recurrence risk ratio are high.4, 5, 6

One way of resolving the problem of heterogeneity is to use a more restrictive definition of phenotype at recruitment or to separately analyse subsets of the study participants with more homogeneous disease characteristics. For example, Wang et al.7 reported significant evidence of linkage to a locus on chromosome 1 by considering subjects with myocardial infarction (MI), diagnosed by age 45 years in males or 50 years in females. Applying narrow criteria at recruitment involves strong assumptions about which phenotypic groups are likely to be genetically homogeneous. On the other hand, separate analysis of subsets of study participants may lack statistical power to detect linkage. An alternative approach is to include suspected sources of heterogeneity as covariates in the statistical model. The use of covariates in linkage analysis has contributed to mapping or confirming the position of genes for prostate cancer,8, 9 late-onset Alzheimer disease10, 11 and recurrent early-onset depression.12 In this study, we report the results of a sibling pair linkage analysis of CAD on chromosome 2 using the British Heart Foundation (BHF) Family Heart Study including hypercholesterolemia as a covariate.

Materials and methods

Study subjects

Data collection has been described in detail elsewhere.1 Briefly, the BHF Family Heart Study comprises 1933 families including 4175 persons affected with CAD (1675 affected sibling pairs, 220 affected trios and 38 sibships or extended families with more than three affected individuals). Affection with CAD was defined as having had one or more of these conditions before age 66 years: MI, angina, percutaneous transluminal coronary angioplasty or coronary artery bypass surgery. Clinical information available for all or the majority of patients include body mass index (BMI), age of CAD onset, and self-reported histories of smoking, hypercholesterolemia, diabetes and hypertension. Four hundred and sixteen microsatellites were genotyped1 and the present covariate analysis is based on 38 of them, which cover chromosome 2. Study participants reported clinical information about themselves by completing a questionnaire. The study was approved by the Yorkshire Multicentre Research Ethics Committee and by 206 local research ethics committees across the UK. For hypercholesterolemia, participants were asked whether their general practitioner (family doctor) had ever told them that they had a high cholesterol level and whether they had ever taken any lipid-lowering treatment.

Statistical analysis

To investigate the effect of heterogeneity on linkage of CAD to chromosome 2, regression analyses were carried out expressing allele-sharing probabilities as a function of a series of covariates, using the method proposed by Rice13 and extended by Holmans.12, 14 The probability P that a pair of siblings share identical-by-descent (IBD) an allele inherited from a particular parent at a putative disease locus is expressed as a function of the covariate x

The test statistic for linkage allowing for the covariate effect is based on a ratio of likelihoods:

Binary variables such as hypercholesterolemia give rise to a three-level factor β for each sibling pair (0, 1 or 2 hypercholesterolemics in the pair). Multipoint posterior IBD probabilities (based on the marker genotypes) were computed using the program Allegro.15 Regression analysis was carried out (computation of the LOD score of Eq. 2) with version 9 of STATA16 using programs made available by David Clayton (http://www-gene.cimr.cam.ac.uk/clayton/software/stata/ibdreg). As some families had more than two affected sibling pairs (ASPs), the logistic regression analysis was clustered on family and robust variance estimation was used to adjust for familial correlation.

The LOD score associated with this method ((2)) is not directly comparable to a classic LOD score because of the larger number of degrees of freedom. In order to assess the genome-wide significance level, 1000 data sets were therefore simulated under the null hypothesis of no linkage with the same family structures as in the real data using the program MERLIN.17 In these simulations, genotypes were randomly assigned to family founders at all markers based on allelic frequencies in the sample, assuming Hardy–Weinberg and linkage equilibrium. Alleles were then transmitted from one generation to the next according to Mendelian laws using the true genetic map. As we are interested in testing the significance of overall linkage including the covariate effect, only genotypes were randomised whereas the covariate was kept unchanged. The simulated data were analysed using the same method as the real data. In addition to the covariate analysis, the original multipoint non-parametric linkage analysis based on SALL statistic using the program Allegro15, 18was repeated in two subsets of the data: both sibs in the pair with hypercholesterolemia and neither sib in the pair with hypercholesterolemia. The association between hypercholesterolemia, CAD subtypes and other risk factors was examined using χ2 tests for binary factors and a two-sided t-test for continuous traits.

Results

Hypercholesterolemia data

The clinical data are summarised in Table 1. The majority of subjects (3000, 72%) reported themselves as being hypercholesterolemic and taking cholesterol-lowering treatment. Among those who said they were not receiving treatment a further 384 reported themselves to be aware that they had hypercholesterolemia, and 674 indicated that they were not hypercholesterolemic. There were also 44 persons reporting treatment who said they had not been told they were hypercholesterolemic. The remaining 73 did not know their hypercholesterolemic status. As the proportion of somewhat ambiguous self-reports is not high (10.3%, i.e. 384 who were hypercholesterolemic but not treated and 44 who were non-hypercholesterolemic but under treatment), the analysis was carried out using the entire sample except only the 73 with missing data (1.7%). The analysis was thus based on 4102 persons (full sample, i.e. 3384 hypercholesterolemic and 718 non-hypercholesterolemic) (Table 1). This gave rise to 108 pairs of non-hypercholesterolemic sibs, 605 pairs where only one sib was hypercholesterolemic and 1678 pairs where both were hypercholesterolemic. A sensitivity analysis was carried out excluding the 10.3% with ambiguous self-reports (reduced sample).

Table 1 Self-reported status on hypercholesterolemia covariate

Covariate analysis of sibling pairs affected by CAD

The maximum LOD score obtained in the analysis of ASPs using the full sample and including hypercholesterolemia as a covariate is 4.4 (Figure 1) between markers D2S112 and D2S1326, in the same region as previously.1 In 1000 simulations under the null hypothesis of no linkage, at least one LOD score peak 4.4 was reached 40 times, giving a genome-wide significance of 0.04. (In fact, in each of these 40 simulations only one such peak was observed if, following Holmans et al.,12 two peaks are considered to define two different linkage regions when they were separated by 30 cM or more.)

Figure 1
figure 1

LOD scores of linkage to CAD based on the likelihood ratio test including hypercholesterolemia as a covariate (plain line) and not including any covariate (dotted line) on chromosome 2. The dashed vertical lines indicate the 1-LOD support interval. The position of three candidate genes is indicated. As the markers are not evenly spaced, the x axis is not drawn to scale. Positions of the closest markers to these genes and to the peak LOD score are: 117.9 cM for D2S160, 145.0 cM for D2S1326, 158.8 cM for D2S142 and 172.5 cM for D2S335.

At the position of the highest LOD score, the effect of the covariate itself is highly significant. Taking ASPs both of whom are hypercholesterolemic as the baseline, the estimate of the coefficient β0 for ASPs neither of whom is hypercholesterolemic is 0.66 (95% confidence interval (CI): 0.37–0.94, P<0.001), whereas no difference was found in ASPs with one hypercholesterolemic and one non-hypercholesterolemic sibling compared with the same baseline (β1=0.07, 95% CI: −0.09–0.22, P=0.39, Table 2). This indicates greater allele-sharing at this locus in ASPs neither of whom is hypercholesterolemic compared with other ASPs.

Table 2 Parameter estimates from the linkage analysis with hypercholesterolemia as a covariate at the locus on chromosome 2 showing most evidence of linkage

Allele sharing estimates and subsets analysis

Siblings affected by CAD without hypercholesterolemia share on average 1.25 alleles IBD (95% CI: 1.15–1.36, Table 3) at the position of the LOD score peak. This IBD sharing is higher than in siblings both of whom have hypercholesterolemia (1.00 alleles, 95% CI: 0.98–1.04) and in sibling pairs only one of whom is hypercholesterolemic (1.03 alleles, 95% CI: 0.99–1.09), neither of which differ significantly from expected sharing under the null hypothesis of no linkage. In summary, only sibling pairs without hypercholesterolemia show evidence of linkage to the locus on chromosome 2. This is further illustrated by the subset analysis: based on SALL statistic computed using the program Allegro,15, 18 a maximum LOD score of 3.74 (pointwise P=0.000017) was obtained at the same locus as previously when only non-hypercholesterolemic pairs were considered, whereas the peak was much smaller in the subset of pairs both of whom are hypercholesterolemic (LOD=1.09, Figure 2). In non-hypercholesterolemic patients, the probability of sharing one allele IBD at the locus of the highest LOD is 0.54 (95% CI: 0.46–0.61) and that of sharing 2 alleles IBD is 0.36 (95% CI: 0.28-0.44, see Table 3). Based on the probability of sharing no allele between affected siblings,19 the estimate of the sibling recurrence risk ratio (λs) from this locus in non-hypercholesterolemic subjects is 2.5, although it should be borne in mind that such estimates are approximate and dependent on ascertainment.20, 21

Table 3 IBD probabilities and average number of alleles shared at the position of the highest LOD score in sibling pairs affected by CAD according to the number of hypercholesterolemics in the pair
Figure 2
figure 2

Non-parametric LOD scores computed along chromosome 2 using Allegro15 in separate subsets of non-hypercholesterolemic (plain line) and hypercholesterolemic sibling pairs (dotted line). The dashed vertical lines indicate the 1-LOD support interval for the non-hypercholesterolemic pairs subset. The position of three candidate genes is indicated. As the markers are not evenly spaced, the x axis is not drawn to scale. Positions of the closest markers to these genes and to the peak LOD score are: 117.9 cM for D2S160, 145.0 cM for D2S1326, 158.8 cM for D2S142 and 172.5 cM for D2S335.

Sensitivity analysis

The analysis based on exclusion of subjects with ambiguous reports of hypercholesterolemia status (Table 1) confirmed linkage to the same locus with a LOD score of 4.6 for linkage including hypercholesterolemia as a covariate. The number of ASPs with two non-hypercholesterolemics reduced only slightly, from 108 to 102 (474 ASPs with one hypercholesterolemic and 1340 ASPs with two hypercholesterolemics).

Association between hypercholesterolemia, CAD and its other risk factors

Table 4 shows that non-hypercholesterolemic subjects are more likely to have had an MI than those with hypercholesterolemia (65% compared with 61%, P=0.037). The frequency is higher still in the 108 sibling pairs both of whom are non-hypercholesterolemic (69%, P=0.023, when compared to all hypercholesterolemics). Non-hypercholesterolemics are also more likely to be male, less likely to have diabetes or hypertension, and have a slightly lower BMI and older age at disease onset than hypercholesterolemics, although the differences are small (Table 4). The association of hypercholesterolemia with other factors raises the possibility that the effect on linkage is attributable to an associated factor or factors, rather than hypercholesterolemia itself. To investigate this each of the factors from Table 4 was considered as an independent covariate in the linkage analysis, but none showed any evidence of heterogeneity associated with the chromosome 2 locus (data not shown). This therefore argues strongly against this explanation.

Table 4 Association between hypercholesterolemia MI and other risk factors

Discussion

Genetic heterogeneity has a major impact on the power to map genes for complex diseases. The results reported in this study suggest that the absence of significant evidence of linkage of CAD to chromosome 2 in the original analysis,1 despite the large sample size, may be due to genetic heterogeneity associated with hypercholesterolemia. Using hypercholesterolemia as a covariate, we found a locus with a LOD score of 4.4 (Figure 1) with a genome wide P-value of 0.04. The effect size of the covariate is positive and highly significant (0.66, P<0.001, see Table 2) in sibling pairs, both of whom are unaffected by hypercholesterolemia compared with sibling pairs both of whom are hypercholesterolemic, and this result was also illustrated by subset analysis and estimation of allele sharing.

Covariate definition is an important issue. In our study, hypercholesterolemia status was based for the majority of subjects on self-report of whether they had ever received treatment for this condition. Treatment may be given as a preventative measure, but we would expect the majority of subjects who have never received treatment to be, or to have been at the time of their diagnosis, free from hypercholesterolemia, although some misclassification is of course possible. Measures of total cholesterol were also available for a small number of subjects, but these measures were taken in most cases while on treatment and also many years (up to 39 years) after their first CAD event, so that they are unlikely to reflect the lipid profile that prevailed when they were first diagnosed and started treatment. We therefore did not make use of these measures in this study.

Six additional covariates (sex, age of onset, diabetes, smoking, hypertension and BMI) were also analysed as potential sources of heterogeneity, but none of these showed any evidence of heterogeneity associated with the chromosome 2 locus (data not shown). As we did not correct for the number of covariates tested, the results presented here should be treated with some caution. However, it is of interest that there is other evidence of linkage of CAD to this region: a LOD score of 3.7 was reported in an isolated Finnish population22 and a LOD score of 3.82 was found in the same region using a US cohort of MI patients.7 In the Finnish study,22 families were excluded when a proband was affected by familial hypercholesterolemia, whereas the US study7 excluded all the subjects affected by hypercholesterolemia. Farrall et al.23 also found a LOD score >1 in the same region using a cohort from across four European countries. In addition to confirming these results reported previously, our study clearly identifies the phenotypic subgroup that is linked to this locus and gives an approximate estimation of the associated λs in this subgroup.

The commonest recognised mechanism that gives rise to CAD is the accumulation of low-density lipoprotein-cholesterol (LDL-C) in the arterial wall forming a ’plaque’ followed by its rupture and blood clotting leading to areas of increasing arterial obstruction and eventually vessel occlusion.3 This process is associated with a number of risk factors, particularly a high level of LDL-C circulating in the blood. In contrast high-density lipoprotein-cholesterol (HDL-C) appears to play a protective role by inhibiting the oxidation of LDL-C and by transporting LDL-C from tissues to the liver where it is metabolised.3 Low level of HDL-C is an independent risk factor for CAD.3, 24, 25

The results of our study suggest a locus on chromosome 2 that predisposes to CAD in subjects whose cholesterol is not considered to be raised. A pathway independent of cholesterol may then have been exposed in this group. There are many contenders, not least any cause of arterial membrane inflammation (such as infection, blood pressure, hormones, increased level of inflammatory mediators) that can stimulate blood coagulation and artery occlusion.3, 26, 27, 28, 29 A strong candidate locus in the region of interest is the interleukin 1 gene cluster (IL1). An increased level of IL1 cytokines has been repeatedly associated with inflammatory mechanisms, vascular thrombogenicity and arthrosclerosis both in vivo and in vitro.3, 27, 29, 30

An alternative is that a group with low HDL-C has been identified. Supporting this possibility are the results of two other studies: using the Framingham Heart Study data set, a quantitative trait locus in the same region was found to be significantly involved in variations of HDL-C levels.31, 32 Candidate genes that are possibly involved in the regulation of HDL-C levels in this region are PLA2R1 and OSBPL6.31 The product encoded by PLA2R1 is a receptor of the enzyme sPLA2 (secretory phospholipase A2), a major catabolyser of HDL-C in acute and chronic inflammatory conditions.33, 34 Association has been reported between increased levels of sPLA2 and CAD, particularly angina.35, 36, 37 The gene OSBPL6 codes for the oxysterol binding protein-like-6 receptor. Oxysterols, which bind to this receptor, are products of cholesterol oxidation and their levels are also associated with CAD.38 All of these candidate genes lie outside the 1-LOD support interval for the peak LOD score (see figures), but it has been found that linkage peaks often lie quite distant to disease loci.39

In summary, we report stronger evidence of linkage to CAD of the initially reported locus on chromosome 2 when allowing for hypercholesterolemia, the evidence for linkage being confined to subjects without hypercholesterolemia. This study illustrates the potential of inclusion of covariates to increase the power of genetic linkage analysis in the presence of heterogeneity. The chromosome 2 locus is a good candidate for further investigation in order to find the actual polymorphisms implicated in the predisposition to CAD in patients unaffected by hypercholesterolemia.