Main

With the sequencing of the human genome it has become apparent that variation in individuals is quite extensive. There is increasing evidence that this variation is best described by groups of associated polymorphisms referred to as haplotypes.13 Haplotypes reflect global gene structure, encompassing chromosomal blocks that have remained unbroken by recombination during the population history of the gene. Thus, haplotypes capture the majority of common variation in a gene; consequently, the use of haplotypes is more likely to identify disease-variation associations than is the use of a random single polymorphism. Identification of a haplotype associated with increased or decreased disease risk should facilitate identification of the actual functional variant that affects disease risk, because this variant should lie on chromosomes identified by that haplotype.4,5 Genotyping to determine haplotype structure and frequencies is required for this type of analysis. A major challenge is determination and selection of the polymorphisms that will be used to determine haplotypes in a given population. We present a two-step method for this process, illustrating its use with the example of the lipoprotein lipase (LPL) gene.

LPL plays a major role in lipid metabolism. Located on capillary endothelium, LPL hydrolyzes triglycerides of chylomicrons and very low density lipoproteins, generating free fatty acids and monoacylglycerol. Complete deficiency of LPL results in the familial chylomicronemia syndrome.6 Because LPL activity affects the concentration of triglycerides, an important cardiovascular risk factor, LPL has been studied as a candidate gene for atherosclerosis. Several studies have examined linkage and associations of polymorphisms within the LPL gene with features of the cardiovascular dysmetabolic syndrome, including hypertension, insulin resistance, dyslipidemia, and atherosclerosis. For example, Wu et al. demonstrated linkage of the LPL locus to systolic blood pressure in nondiabetic relatives of Taiwanese subjects with type 2 diabetes.7 The Hin dIII polymorphism in intron 8 of the LPL gene has been associated with measurements of insulin resistance in normoglycemic Caucasian and Hispanic subjects8 and Chinese subjects.9 The Ser447Stop polymorphism has been found to be associated with decreased atherosclerosis risk.10 Both the Hin dIII and Ser447Stop polymorphisms are in the 3′ end of the LPL gene, downstream of a recombination hotspot.11 Our long term goal is to understand how variation in the 3′ end of this gene affects the phenotypes of the dysmetabolic syndrome. Thus, as a first step in investigating the role of this gene, we genotyped a series of LPL 3′ end single nucleotide polymorphisms (SNPs) to determine the haplotype structure in this region. Mexican-American families ascertained via a proband with coronary artery disease were genotyped to test disease associations. A second population, randomly selected non-Hispanic Caucasians, were studied to compare their haplotype distribution with that of the Mexican-Americans. Not only are there differences between the two groups in haplotype distribution, but the results also indicate an association of coronary artery disease with LPL haplotypes in the Mexican-American population.

MATERIALS AND METHODS

Subjects

The UCLA/Cedars-Sinai Mexican-American Coronary Artery Disease (MACAD) Project enrolls families ascertained through a proband with coronary artery disease, determined by evidence of myocardial infarction on electrocardiogram or hospital record, evidence of atherosclerosis on coronary angiography, or history of coronary artery bypass graft or angioplasty. DNA is obtained from all available family members, and the adult offspring (age 18 or older) of the proband and the spouses of those offspring are also asked to undergo a series of tests to characterize their metabolic and cardiovascular phenotype, including indices of insulin resistance determined by euglycemic clamp study, lipid parameters, lipase activities, and carotid intima-media thickness.

In a separate project, non-Hispanic Caucasian families were recruited for a genetic linkage study to determine the influence of specific genes on interindividual variation in the lipoprotein response to a low-fat, high-carbohydrate diet. Siblings were placed on either a high-fat or a low-fat diet and changes in lipids and lipoproteins were monitored. In the current study, we examined this population in terms of haplotype frequency for comparison to Mexican-Americans.

All studies were approved by Human Subjects Protection Institutional Review Boards at UCLA, Cedars-Sinai, and UC Berkeley respectively. All subjects gave informed consent prior to participation.

Genotyping

The first stage of our haplotyping methodology consists of genotyping a number of single nucleotide polymorphisms (SNPs) spanning a region of a candidate gene in a limited number of subjects. Haplotypes are then constructed using these variants, with subsequent selection of a smaller number of variants that allow discrimination of the most common haplotypes on the majority of chromosomes observed in the population. In the second stage of the haplotyping protocol, the restricted set of SNPs identified in the first stage is genotyped in a large number of individuals using a high-throughput technology and used to determine haplotypes on a population scale.

Stage 1

Twenty-nine subjects from 8 randomly selected families from MACAD were genotyped at 10 single nucleotide polymorphisms (4872, 5168, 5441, 6863, 7315, 8292, 8393, 8852, 9040, and 9712) originally delineated in the Molecular Diversity and Epidemiology of Common Disease (MDECODE) project, a study of Finns, non-Hispanic Caucasian Americans, and African American subjects.12 The numbering of the SNPs corresponds to that reported by Nickerson et al.12 and corresponds to Genbank accession number AF050163. 8393 is the Hin dIII variant located in intron 8 and 9040 is the Ser447Stop variant located in exon 9. 4872, 5168, and 5441 are in intron 6; 6863 and 7315 are in intron 7; 8292 and 8852 are in intron 8; 9712 is in intron 9; these markers were selected because they spanned a region of the LPL gene downstream of a recombination hotspot and had a minor allele frequency of 15% or greater in MDECODE.12 PCR amplification followed by restriction digest with Hin dIII was used to genotype the polymorphism at 8393. A single nucleotide primer extension method was used to genotype the remaining nine SNPs (4872, 5168, 5441, 6863, 7315, 8292, 8852, 9040, and 9712). Analysis of these initial data showed that a restricted set of six SNPs encompassed all the major 3′ end haplotypes.

Stage 2

Large-scale genotyping of the six SNPs in 514 subjects from 85 MACAD families and 629 subjects from 157 non-Hispanic Caucasian families was performed using the 5′-exonuclease (Taqman MGB) assay.13 PCR primer and oligonucleotide probe sequences are listed in Table 1. In this assay, allele-specific oligonucleotide probes are labeled with different fluorophores (FAM or VIC) at their 5′-ends and with a quencher molecule at the 3′-end. The quencher interacts with the fluorophores by fluorescence resonance energy transfer, quenching their fluorescence. These probes are included in the PCR reaction mixture amplifying a 100 to 150 base pair segment with the polymorphism at the center. During annealing, the probes hybridize to the PCR products, and during extension, the 5′-3′ exonuclease activity of the DNA polymerase degrades perfectly matched annealed probes, separating the fluorophore from the quencher. Imperfectly matched probes are displaced into solution without degradation. Comparison of relative fluorescence from each fluorophore allows determination of genotype.

Table 1 Primers and probe sequences used in the 5′-exonuclease assay

Data analysis

Based on pedigree structures and genotype data of all individuals in each pedigree, haplotypes were reconstructed as the most likely set (determined by the maximum likelihood method) of fully determined parental haplotypes of the marker loci for each individual in the pedigree, using the simulated annealing algorithm implemented in the program Simwalk2.14 All comparisons between groups of subjects comprised comparisons of unrelated founders, and only founder chromosome data are presented in the tables. Founder haplotypes, i.e., those haplotypes from parents and individuals marrying into the family, were used to calculate haplotype frequencies in 482 chromosomes from 241 Mexican-American founders and in 582 chromosomes from 291 non-Hispanic Caucasian founders. Six Mexican-American and 21 non-Hispanic Caucasian founders were excluded from analysis because their haplotypes could not be unambiguously determined. The χ2 test was used to compare allele and haplotype frequencies between the Mexican-Americans without coronary artery disease and the non-Hispanic Caucasians.

A case-control association study of coronary artery disease was performed by comparing haplotype frequencies between Mexican-American founders with and those without coronary artery disease. The cases were 77 probands (154 chromosomes) with coronary artery disease; the controls (164 individuals, 328 chromosomes) were their spouses plus the spouses marrying into the offspring generation. Because the cases and controls were genetically unrelated, their allele and haplotype frequencies and gender distribution were compared using the χ2 test. Student’s t test was used to compare the mean age of the cases versus the controls. Odds ratios for coronary artery disease by haplogenotype were calculated, using logistic regression analysis to adjust for any confounding effects of age or sex in the case-control comparison. Analyses were performed using SAS System software.15

RESULTS

In the first stage, the haplotypes of 28 unique chromosomes from unrelated founders were derived using Mexican-American family data and are shown in Table 2 in order of frequency. These results were used to select the markers genotyped in the large population samples. As seen in Table 2, markers 7315, 8292, 8393, 8852, and 9040 are sufficient to distinguish the haplotypes from each other. In addition to these five SNPs, 9712 was also chosen because it is predicted to distinguish two major ancient clades according to the haplotype tree constructed by Templeton et al16 in the MDECODE project. The results reported herein are consistent with their study of the haplotype structure of 9.7 kb of the LPL gene that described four ancient cladistic groups; markers 7315, 8393, and 9712 should be able to distinguish all four of the ancient 3′LPL clades.16

Table 2 Pilot study LPL haplotypes

In the second stage, the six selected markers were genotyped in 514 Mexican-American subjects from 85 families and 629 subjects from 157 non-Hispanic Caucasian families. The allele frequencies are shown in Table 3. The markers from Mexican-Americans without coronary artery disease are presented in this table in order to eliminate any disease-based ascertainment bias in delineating the ethnic comparison. Of note, although 9040 (Ser447Stop) was extremely rare in the previous MDECODE study subjects (not detected in African Americans or Finns and found with a frequency of 4% in U.S. non-Hispanic Caucasians12), in this study it was found with a frequency of 7% in Mexican-Americans and 10% in our non-Hispanic Caucasians. Comparing Mexican-Americans to non-Hispanic Caucasians, the allele frequencies were significantly different for four out of the six variants (Table 3).

Table 3 LPL SNP allele frequencies in Mexican-Americans and non-Hispanic Caucasians

The founder haplotype frequencies from the Mexican-Americans without coronary artery disease were compared with those of the non-Hispanic Caucasians. The six most common haplotypes, comprising over 99% of the observed haplotypes for each group, are presented in Table 4. Both groups shared haplotype 1 as the most common haplotype. There were several differences between the two groups in regards to the haplotype distribution. Haplotypes 2, 3, 4, and 5 were more common in the non-Hispanic Caucasian population; haplotypes 1 and 6 were more common in the Mexican-Americans. These differences reached statistical significance for the three most frequent haplotypes.

Table 4 LPL haplotype frequencies in Mexican-Americans compared with non-Hispanic Caucasians

In the case-control study, Mexican-American probands with coronary artery disease were compared with their spouses and the spouses of their offspring, none of whom had coronary artery disease. Thus, these case and control individuals were all genetically unrelated. The mean age of the cases was 62.2 years; that of the controls was 42.6 years (P < 0.0001). This age difference was expected, given that the control group was comprised of individuals from both the parental and offspring generations. The sex distribution was similar between the groups, with males comprising 44% of the cases and 38% of the controls (χ2 = 0.9, P = 0.35).

The genotype frequencies for all six markers were in Hardy-Weinberg equilibrium for both the cases and the controls. Allele frequencies of the six SNPs did not differ significantly among the Mexican-Americans according to coronary artery disease status (Table 5). A comparison of genotype frequencies showed no differences between cases and controls, except for a modestly significant difference for the 8393 (Hin dIII) variant (P = 0.05). However, comparison of the common haplotype frequencies between the Mexican-Americans with and without coronary artery disease revealed a significant decrease in the frequency of the most common haplotype in those with disease (Table 6). This suggests an increase in frequency of less common haplotypes among cases, the detection of which was hindered by the available sample size. Haplotype 1 was associated with a significantly decreased risk of coronary artery disease (P = 0.03). Of the less common haplotypes, haplotype 4 was most prominently associated with the greatest risk of coronary artery disease (P = 0.10), though this result did not attain statistical significance with the given sample size. A comparison of subjects homozygous for haplotype 1 with subjects with all other genotypes is presented in Table 7. Homozygosity for haplotype 1 was associated with protection against coronary artery disease with an odds ratio of 0.50 (95% CI 0.27–0.91). Use of the logistic regression model to adjust for age and sex, separately and in combination (Table 7), did not alter the significance of this association (odds ratio estimates from 0.39 to 0.51). None of the haplotypes other than haplotype 1 showed a statistically significant association with coronary artery disease (data not shown).

Table 5 LPL SNP allele and genotype frequencies in Mexican-Americans with and without CAD
Table 6 LPL haplotype frequencies in Mexican-Americans with and without coronary artery disease
Table 7 Logistic regression analysis comparing haplotype 1 homozygotes with all other haplogenotypes

DISCUSSION

We have illustrated herein the use of a small pilot study in selecting which polymorphisms to genotype in a large sample with the goal of delineating the most common haplotypes in a population. Application of this approach to the LPL gene revealed significant differences in haplotype frequency between Mexican-Americans and non-Hispanic Caucasians as well as an association of LPL haplotypes with coronary artery disease within the Mexican-Americans.

Currently there is much interest in the use of haplotype data in the genetics of common disease. Investigators are faced with the considerable challenge of how many and which variants to genotype in a given candidate gene for haplotype determination. Gabriel et al.3 sequenced 13 megabases across the genome in subjects from Africa, Europe, and Asia. They showed that the human genome is organized in haplotype blocks (most of which are longer than 10 kb), with three to five commonly occurring (> 5%) haplotypes per block. Only six to eight variants were sufficient to define the most common haplotypes in each block. The challenge is how to select these variants efficiently and affordably. In the protocol described here, the first stage is to genotype a number of variants that span a genomic region of interest. This is performed in a subset of the study population to minimize costs. These data are then used to determine the haplotypes in that region. The most frequently occurring haplotypes are then identified, and only those SNPs that are necessary to define these haplotypes (typically six or fewer such haplotypes) are then genotyped on large scale, yielding the most common haplotypes in a population for association analysis. The availability of family data assists this approach by facilitating unambiguous determination of haplotypes.

To our knowledge, extensive genotyping of the LPL gene to generate haplotypes in Mexican-Americans has not been previously reported. We found that six markers in the 3′ end of LPL allowed us to distinguish the most common haplotypes occurring in two major U.S. ethnic groups, Hispanic and non-Hispanic Caucasians. The allele and haplotype frequencies were different between the Mexican-Americans and non-Hispanic Caucasians. Furthermore, within the Mexican-Americans, there were positive and negative associations of particular haplotypes with coronary artery disease, suggesting a role of this gene in contributing to CAD in this population. Of note, the association of haplotype 1 with CAD remained significant after adjustment for age, despite the age difference between the cases and the controls. LPL haplotype 1 was associated with a decreased risk of coronary artery disease, both in terms of allele and haplogenotype frequency. With one haplotype clearly decreased in frequency in affected subjects, one also expects an increase, and this was most prominent for LPL haplotype 4, which exhibited a higher frequency in those Mexican-Americans with coronary artery disease but did not attain statistical significance given the available sample size.

In comparing two different ethnic groups, we found several differences in the allele and haplotype frequencies observed in the 3′LPL markers. Such differences may affect results of association studies conducted in different populations. In particular, different alleles of Hin dIII occurred at different frequencies, which may account for disparate results of association studies conducted in different populations. For example, a study of postmenopausal Caucasian women found no association of the Hin dIII variant with glucose or insulin levels, while a study in Chinese men with coronary heart disease found an association of Hin dIII with steady state plasma glucose levels, a marker of insulin resistance.9,17 Because our results strictly apply to Mexican-Americans living in Southern California, it will be informative to determine whether these haplotypes are associated with CAD in Mexicans living in other parts of the world.

We expect that the haplotypes described here will be useful in future studies exploring the association of the LPL gene with components of the cardiovascular dysmetabolic syndrome. This is illustrated here, in that haplotype frequencies were different according to coronary artery disease status. Only one out of six single polymorphic sites was associated with coronary artery disease. This demonstrates that the common approach of examining one or two polymorphisms per candidate gene may fail to detect phenotypic associations. Compared to single-variant analysis, haplotype-based analysis reduces the potential for false negatives in association studies. The benefit of a haplotype-based analysis is that it captures all of the variation across a region, which should, as it did in our study, improve the ability to detect an association. This study thus demonstrates the improved power of haplotyping in elucidating disease gene associations and the importance of ethnic specific haplotype data.