Susceptibility to coronary heart disease (CHD) has long been known to exhibit familial aggregation, with heritability estimated to be greater than 50%. The French Canadian population of the Saguenay-Lac Saint-Jean region of Quebec, Canada is descended from a founder population that settled this region 300–400 years ago and this may provide increased power to detect genes contributing to complex traits such as CHD. Probands with early-onset CHD, defined by angiographically determined coronary stenosis, and their relatives were recruited from this population (average sibship size of 6.4). Linkage analysis was performed following a genome-wide microsatellite marker scan on 42 families with 284 individuals. Nonparametric linkage (NPL) analysis provided suggestive evidence for a CHD susceptibility locus on chromosome 8 with an NPL score of 3.14 (P=0.001) at D8S1106. Linkage to this locus was verified by fine mapping in an enlarged sample of 50 families with 320 individuals. This analysis provided evidence of linkage at D8S552 (NPL score=3.53, P=0.0003), a marker that maps to the same location as D8S1106. Candidate genes in this region, including macrophage scavenger receptor 1, farnesyl-diphosphate farnesyltransferase 1, fibrinogen-like 1, and GATA-binding protein 4, were resequenced in all coding exons in both affected and unaffected individuals. Association studies with variants in these and five other genes did not identify a disease-associated mutation. In conclusion, a genome-wide scan and additional fine mapping provide evidence for a locus on chromosome 8 that contributes to CHD in a French Canadian population.
Coronary heart disease (CHD) is one of the leading causes of death in western countries. The relative hazard of early-onset CHD death for monozygotic twins has been estimated to be 8.1 and 15.0 for men and women, respectively, compared to 3.8 and 2.6 for dizygotic twins,1 suggesting an underlying genetic basis. While some cases of CHD are due to monogenic disorders such as familial hypercholesterolemia (FH) (reviewed by Hobbs et al2), the majority of cases are thought to have a multifactorial origin. Family history is a significant risk factor,3 and the heritability of CHD is estimated to be >50%.4 Rare variants contributing to heart disease have been identified through investigations of monogenic disorders (reviewed by Hegele5), and several known genes harbor common variants that contribute to CHD susceptibility. However, there still exist undefined genetic risk factors for CHD.
In a founder population (here referring to the modern day descendents of a founder population), there is an increased chance that affected individuals share the same susceptibility allele inherited from a common ancestor. The population of the Saguenay-Lac Saint-Jean (SLSJ) region of Québec (280 000 individuals) is derived from approximately 2000 17th century founders. However, genealogical, historical, and demographic studies have revealed that the effective number of founders is smaller, perhaps as low as 400.6 The use of such a large population derived predominantly from a small founder group 12 generations ago provides a unique tool for identifying disease genes.7, 8 This approach has been effective in the identification of Mendelian diseases genes in several founder populations including ones in Finland9 and Holland10 as well as in the SLSJ.11 The usefulness of founder populations for the discovery of genes underlying complex traits has only recently begun to be explored,12, 13, 14 but has nonetheless spawned some notable recent successes15, 16, 17
To detect CHD susceptibility loci, we performed a genome-wide scan on families with early-onset CHD from the SLSJ. Early-onset CHD was clinically defined in probands as greater than 50% stenosis in at least two coronary arteries before the age of 62 and 66 years for men and women, respectively, as described in Gaudet et al.18 This scan identified several regions with evidence of suggestive linkage. One locus in particular, at 8p22, had a nonparametric linkage (NPL) score of 3.14 (P=0.001). We examined this region by fine mapping and association studies of several candidate genes.
Materials and methods
Probands were ascertained from the records of the Clinical Research Unit, Complexe hospitalier de la Sagamie in Chicoutimi, Québec, a tertiary hospital. All subjects gave informed written consent according to institutional and national standards19 and all data and DNA samples were coded to maintain patient confidentiality.20 This study was approved by the Chicoutimi Hospital Ethics Committee, the Montreal General Hospital Research Institute Institutional Review Board, and the McGill University Institutional Review Board.
Probands had at least 50% stenosis in at least two coronary arteries and were before the age of 62 and 66 years for men and women, respectively. The age cut-offs that we used for inclusion in the study were selected to balance the desire for ascertaining early-onset CHD cases (to amplify the genetic contribution), while still recruiting enough families to have good power. Each proband had at least one sibling with CHD, and all four grandparents were French Canadian. Exclusion criteria included FH in the probands or the known presence of a family member homozygous for mutations in the lipoprotein lipase gene (LPL) gene.21 Siblings (of any age) were considered to be CHD cases if they met the same angiographic criteria or presented with a history of revascularization (1.2%), myocardial infarction (MI) (6.2%), or angina diagnosed according to standard clinical criteria22 (5.4%). Recent work provides compelling evidence that the same genetic variation can underlie differing definitions of CHD.23, 24 Some of the individuals included in this study were being treated with statins. Any improvement in the presence of statins, would not have affected the classification of the individuals defined as cases. And as the unaffected siblings are not used in the NPL statistic, the use of statins in ‘unaffected’ individuals, could have led to a loss of power, but no increase in Type 1 error. The recruited families have an average sibship size of 6–7 individuals (Table 1) and overlap with the families examined in a recently published candidate gene study.25
In stage I, 42 families were recruited including 119 individuals defined as affected (see above). The addition of unaffected siblings and parents brought the total number of individuals to 284. In stage II, fine mapping was performed on these individuals along with 42 additional individuals including nine new families. One family included in stage I was later found to have individuals homozygous for an LPL mutation (D9N) and was not fine mapped. Thus, stage II was comprised of 50 families and 320 individuals. DNA was available from only 10 parents and thus, most families were composed of only siblings. The male to female ratio among affected individuals was 1.6:1 and 1.8:1 in stages I and II, respectively. Familial relationships were verified with RelCheck version 0.6726, 27 and GRR.28 One individual whose family relationship could not be confirmed was excluded from the analysis.
Genotyping and DNA sequencing
Genomic DNA was prepared from peripheral blood lymphocytes using the Blood & Cell Culture DNA Midi Kit (Qiagen). Methods for the genome-wide scan are described in Rioux et al.29 Briefly, a modified version of the Cooperative Human Linkage Centre (CHLC) Screening Set, version 6.0,30 that also included Généthon markers31 was used to create panels containing either 310 or 378 microsatellite markers with an average intermarker spacing of 11.1 and 8.7 cM, respectively, and average heterozygosity of approximately 75%. Fluorescently labeled markers (Research Genetics) were detected with ABI 377 and 3700 DNA analyzers. ABI 377 gels were processed using the software BASS/GRACE (LD Stein, unpublished data); size standards and alleles were determined by use of allele-calling software (MJ Daly, unpublished data). For the ABI 3700, alleles were determined using Genotype version 2.1 software (ABI) and PEDMANAGER version 0.9 (MJ Daly, personal communication). Markers with Mendelian inheritance errors were reexamined with manual genotype calls and unresolved data were excluded on a family basis. Fine mapping markers were selected from the integrated maps of the Marshfield Medical Research Foundation. After adding 15 new markers, the average intermarker distance across the fine-mapped region of chromosome 8 was 3.15 cM.
Comparative sequencing of candidate genes and ESTs was accomplished using the Dye Primer and Dye Terminator Cycle Sequencing Ready Reaction kits (PE Biosystems). PCR products were purified using magnetic beads (PerSeptive Diagnostics). Sequencing reactions contained 400 ng of template in 1.5 μl, and 3 μl of assay mixture for each primer. PCR primers were designed using Primer3 software.32 The cycling parameters for the sequencing reactions were: 96°C for 10 s, 55°C for 5 s, and 70°C for 1 min (15 cycles), followed by 96°C for 10 s and 70°C for 1 min (15 cycles). Reaction products were precipitated and run on ABI 377 sequencers. For most SNPs, TaqMan™ assays33 were performed on ABI Prism 7700 Sequence Detectors. Primers and probes were from Invitrogen. Reactions were carried out in a 50 μl volume with a final concentration of 1 × PCR buffer, 4.5 mM MgCl2, 200 μM dNTP, 0.01 U/μl AmpliTaq Gold, 300 nM each primer pair, 25 nM each internal probe, and 20 ng DNA. Thermocycling conditions were individually optimized for each assay. Some SNPs were genotyped by direct sequencing.
For the whole-genome scan, marker data and sex-averaged genetic distances were obtained from the CHLC map34 and for fine mapping were obtained from the CHLC map and the Marshfield map.35 Allele frequencies were estimated from the inferred founder genotypes using PEDMANAGER. Although multipoint NPL analysis does not include the unaffected members of the family in the final statistic, they are used to more accurately reconstruct the inheritance vectors. Analysis of the genome-wide scan and the fine mapping of chromosome 8 was performed with the GENEHUNTER 2.1_r2 computer package.36, 37 The X chromosome markers were analyzed with GENEHUNTER version 1.3. NPL results were obtained using the ‘Sall’ scoring function. Because the statistical analysis of the genome-wide scan was performed after merging two overlapping screening sets (310 and 378 markers), information content may vary across genomic regions. This would lead to a conservative, but still valid test for linkage in these regions.
Lander and Kruglyak's38 thresholds for suggestive and significant linkage were based on an infinitely dense map and asymptotic theory. Markers spaced every 10 cM (a realistic genome-wide scan) would give a lower than expected number of false positives.39 Tests would be conservative if the proposed thresholds were thus used. Adopting high thresholds for a complex trait may also not be reasonable when there is missing data.40 To assess genome-wide significance of our results, we used the method of Sawcer et al.39 We performed 2000 gene-dropping simulations on all autosomes, keeping family structures, genetic map, allele frequencies, and missing data rates constant. Empirical genome-wide P-values were calculated as the proportion of replicates with a maximum score higher than the observed score.
The family-based association test (FBAT) program was used to test the association of microsatellite and SNP markers with CHD in the presence of linkage.41, 42 Because the transmission of alleles to multiple affected individuals in a sibship is not independent in the presence of linkage, we used the empirical variance option. We used TRANSMIT43 (version 2.5.4) to test for haplotype associations of SNPs from single genes as well as for two-microsatellite haplotypes. TRANSMIT assumes the absence of recombination, and was applied only to intervals and families where no recombinations were inferred.
Stage I – genome-wide scan
The results of NPL analysis of the genome-wide scan of the initial 42 families are shown in Figure 1 and summarized in Table 2. Significance was calculated from the distribution of inheritance vectors.37 The maximum multipoint NPL score (3.14) was obtained on chromosome 8 at the marker D8S1106 (locus-specific P-value 0.001). The NPL score of 3.14 has an empirical genome-wide P-value of 0.0625 and while not reaching genome-wide significance, only one in 16 genome scans of the data would have exceeded 3.14 score by chance.
Stage II – additional families and fine mapping
Fine mapping was performed on the original genome-scan families as well as additional family members and new families over a 70 cM region of chromosome 8, centered on the NPL peak (D8S1106). This analysis included six genome scan markers plus an additional 15 markers. The NPL plot for this fine mapping is shown in Figure 2. The maximum multipoint NPL score increased to 3.53 with a locus-specific P-value of 0.00033 at the marker D8S552, which has the same genetic map position as D8S1106 (Table 3).
Sequence analysis of candidate genes
Within 10 cM of the NPL peak on chromosome 8 at D8S552–D8S1106 are several plausible candidate genes for CHD, including macrophage scavenger receptor 1 (MSR1), fibrinogen-like 1 (FGL1), farnesyl-diphosphate farnesyltransferase 1 (FDFT1), and GATA-binding protein 4 (GATA4) (see Figure 2). Five other candidate genes were selected from this region based on their proximity and availability of complete coding sequence. In addition, the LPL gene, 13 cM from the peak, was selected as a candidate gene.
The entire coding regions of MSR1, FGL1, FDFT1, and GATA4 were scanned for sequence variations in individuals from our study. The resequencing of these genes was performed on one affected and one unaffected individual from each of the six families with the highest NPL scores at D8S552 as well as two individuals from CEPH families. Resequencing identified 21 SNPs in the candidate genes. SNP discovery results are summarized in Table 4. All exonic SNPs as well as some intronic SNPs (discovered while sequencing the exons) were genotyped for all 320 individuals in the 50 Stage II families.
To test for association given the linkage observed at 8p22, we performed an FBAT41 for the alleles of the microsatellite markers used in stage II, and for the 21 SNPs found in the genes described above. We did not observe excess transmission of any allele. In particular, neither the missense mutation nor the nonsense mutation in MSR1 exhibited increased transmissions to CHD cases, although recent work suggests that they are associated with prostate cancer.44, 45
Genotype data were also analyzed for transmission distortion of haplotypes to affected offspring with the software TRANSMIT.43 We examined haplotypes consisting of two microsatellites as well as haplotypes of SNPs contained within a gene. Neither two-microsatellite marker haplotypes nor haplotypes of SNPs within genes were associated with CHD.
In a genome-wide scan for CHD in 42 large sibships from the SLSJ region, we identified a linked region of chromosome 8 with an NPL score of 3.14 at D8S1106. After fine mapping, the strongest evidence for linkage was found at D8S552 (90 kb from D8S1106) with an NPL score of 3.53 (P=0.00033). This constitutes strong evidence for a CHD susceptibility locus in chromosome region 8p22 in the SLSJ population. Other chromosomal regions with more modest NPL scores may harbor additional loci that contribute to CHD in this population.
Several other genome-wide scans for CHD have been published (for a review, see Topol et al46). Using a phenotype similar to this study, Pajukanta et al12 showed suggestive evidence for linkage to two regions: 2q21.1–22 and Xq23–36. For these two regions, our genome-wide scan obtained NPL scores greater than 1.5 (Figure 1). A study of the acute coronary syndrome in Australia identified a locus at 2q36–37.3 with a LOD score of 2.63.47 A susceptibility locus for CHD was identified at 16p13 in a Mauritius study sample.48 In addition, a study of MI using affected sibling pairs identified linkage at 14q32 (LOD score of 3.9).49 A comparison of these genome scans with that of ours does not indicate any chromosomal regions common to all studies. Moreover, a meta-analysis of these four studies50 provided evidence for linkage to chromosome 3q26–27, a region which was not the highest LOD score for any individual study. In more recent genome scans, Hauser et al51 reported a LOD score of 3.3 at chromosome 3q13 and Samani et al52 implicated two regions of chromosome 2; one, the previously identified loci of Pajukanta et al12 (2q21.1–22), and another (2p11) that overlaps with a significant finding of Wang et al.53 Interestingly, Wang et al53 reported a LOD score of 11.68 on chromosome one. The PROCARDIS study identified a linkage peak on chromosome 17 for MI that was replicated in a follow-up sample.54 In interpreting these disparate findings, one should consider some general difficulties of the genome-wide linkage scan approach for complex traits. Genetic heterogeneity, within and across studies, will reduce power and lead to heterogeneous results. In addition, the precise phenotype is not always the same across studies. In a meta-analysis of complex trait genome scans, Altmuller et al55 state that lack of replication is commonplace, and there are very few prognostic indicators of success. The authors did, however, find that sampling only one ethnicity (as in this study) increased the chance of observing a significant LOD score. Several factors may favor the detection of CHD genes in the SLSJ founder population, including reduced genetic heterogeneity, and similarities in diet and lifestyle across the population. This may explain the identification of a novel locus at 8p22, not detected in other populations.
Several whole-genome scans for quantitative traits related to CHD have implicated the same region of chromosome 8 that we identified. Naoumova et al56 found evidence for linkage (NPL score of 2.10) at D8S1106 for combined hyperlipidemia, cholesterol and triglyceride levels in familial combined hyperlipidemic families. Notably, a meta-analysis of four genome scans57 identified a peak for HDL (LOD of 2.0) at D8S1130, which had a LOD score of 3.11 in this study (Table 3). In addition, there are reports of linkage signals in this region for type II diabetes58 and for measures of insulin response and abdominal obesity in Mexican Americans.59
GATA4 is a transcription factor that regulates cardiac-specific genes during development. However, it has also been implicated in the development of cardiac hypertrophy60, 61 and mediates a variety of hypertrophic stimuli (see Liang and Molkentin62). We did not observe any coding variants in GATA4. However, we did identify one intronic variant (Table 4).
FDFT1 is the gene for squalene synthase, the enzyme which catalyzes the first committed step of the sterol branch of the cholesterol biosynthetic pathway. Homozygous null mice for the squalene synthase gene were embryonic lethals.63 In addition, inhibitors of this enzyme have been shown to reduce cholesterol and triglyceride levels64, 65, 66 Direct resequencing of FDFT1 identified seven SNPs, one of which results in a lysine to arginine substitution in codon 45, whereas the other six were silent substitutions or noncoding variants.
MSR1 (also known as class A scavenger receptor) is expressed by monocytes and macrophages as well as in liver and brain.67 The expression of this receptor is increased in the macrophages of atherogenic lesions68 and it plays a role in the development of foam cells.69 Moreover, MSR1 null mice (Msr−/−) on a background of atherosclerosis-susceptible apoE-deficient (Apoe−/−) mice demonstrated significantly smaller atherosclerotic lesions compared with Apoe−/− littermates.70 MSR1 has 11 exons, and three isoforms.71 In MSR1, we found one missense mutation and one nonsense mutation, both in exon 6. The nonsense mutation is predicted to yield a protein truncated at codon 293. However, the frequency of this mutation in our families was only 0.04, and no homozygotes were observed.
The putative protein of the FGL1 gene has high sequence homology to the beta and gamma subunits of human fibrinogen,72 and contains four conserved cysteines, characteristic of the fibrinogen protein family. While little is known about FGL1, it does have mouse and rat orthologues, is expressed most highly in the liver,73 and binds to the fibrin matrix of a plasma clot.74 We identified one missense mutation (I15T) single aa code near the putative signal peptide cleavage site.
In addition to the aforementioned genes, five additional genes were scanned for coding SNPs because of their proximity to the NPL peak on chromosome 8. The 10 SNPs identified in these genes are summarized in Table 4. In summary, a total of 6 coding and 15 noncoding variants were identified in the linked region in the SLSJ study sample.
We also examined LPL because of its proximity to the chromosome 8 susceptibility locus. The autosomal recessive form of familial LPL deficiency (type I hyperchylomicronemia) has been described in Québec75 and elsewhere.76 Additional polymorphic variants of the LPL gene are common and well characterized.77 Although we excluded families with individuals homozygous for known LPL mutations, we included families with heterozygous individuals. There were only two families that had a known LPL mutation (D9N) in at least two affected siblings. Resequencing identified no new coding variants.
Recent excitement has been generated in the field of cardiovascular genetics by the results of whole-genome association studies. Specifically, reports have identified an association between a region of chromosome 923, 24 and heart disease. Although the present study does not implicate this region, it should be remembered that association and linkage studies are powered to discover different disease-causing variants along the spectrum of allele frequencies.
In summary, we have identified significant linkage for CHD at 8p22. Within this region, an interval was selected for further analysis. We surveyed the coding regions of four candidate genes for CHD susceptibility and an additional five genes. Although none of the SNPs that we analyzed were significantly associated with CHD, we cannot completely rule out these genes, as only coding regions were resequenced. Future work should include the analysis of high density SNPs in the region.
We thank the patients and their families for their participation in this study. This work was supported by funds from the Canadian Institutes of Health Research (CIHR) (JCE, KM, TJH, and DG), the Canadian Genetic Diseases Network (TJH, KM), the Mathematics of Information Technology and Complex Systems (KM), the CIHR-funded ECOGENE-21 Canadian Alliance for Health Research program (DG), the Canadian Foundation for Innovation (TJH, JCE), and a research contract from Bristol-Myers Squibb, Millennium Pharmaceuticals Inc., and Affymetrix (TJH). TJH is a recipient of an Investigator Award from the CIHR and a Clinician-Scientist Award in Translational Research from the Burroughs Wellcome Fund. DG is the chairholder of the Canadian Research Chair in preventive genetics and community genomics (www.chairs.gc.ca). JCE is a research scholar of the Fonds de la recherche en santé du Québec (FRSQ). We thank M-C Vohl, L Coderre, A Sniderman, R Do, and members of the McGill University and Genome Québec Innovation Centre for helpful discussions, two anonymous reviewers for their suggestions, JC Loredo-Osti for statistical advice, the Chicoutimi Hospital Community Genomic Medicine Centre staff for their contribution, and L Essiembre and C Guillemette for pedigree information. The authors declare no conflicts of interest.