Most common diseases are complex genetic traits, with multiple genetic and environmental components contributing to susceptibility. It has been proposed that common genetic variants, including single nucleotide polymorphisms (SNPs), influence susceptibility to common disease. This proposal has begun to be tested in numerous studies of association between genetic variation at these common DNA polymorphisms and variation in disease susceptibility. We have performed an extensive review of such association studies. We find that over 600 positive associations between common gene variants and disease have been reported; these associations, if correct, would have tremendous importance for the prevention, prediction, and treatment of most common diseases. However, most reported associations are not robust: of the 166 putative associations which have been studied three or more times, only 6 have been consistently replicated. Interestingly, of the remaining 160 associations, well over half were observed again one or more times. We discuss the possible reasons for this irreproducibility and suggest guidelines for performing and interpreting genetic association studies. In particular, we emphasize the need for caution in drawing conclusions from a single report of an association between a genetic variant and disease susceptibility.
For most common diseases, including heart disease, diabetes, hypertension, and cancer, multiple genetic and environmental factors influence an individual's risk of being affected. This complexity contrasts with the inheritance pattern of monogenic disorders, in which the presence or absence of disease alleles usually completely predicts the presence or absence of disease (although the severity or age of onset may vary). For genetically complex diseases, risk alleles are less deterministic and more probabilistic—the presence of a high-risk allele may only mildly increase the chance of disease. Furthermore, it has been proposed that these weakly penetrant alleles may be present at high frequency (>1%) in the population.1–3
The widespread presence of high frequency variants in humans was first shown experimentally by Harris among others,4 who found that many proteins have several common, heritable isoforms, thereby demonstrating that common genetic variation could lead to variation in protein structure. The widespread presence of such variation suggested that common variants might be biologically important. As Harris4 hypothesized in 1971 (see p. 272), “The other group of alleles, though numerically much fewer, are individually much more common. They [common DNA variants] provide the basis for the great variety of enzyme … polymorphisms which evidently occur. These are quite possibly the underlying biochemical cause of much of the inherited diversity in the physical and physiological characteristics of individuals, and also in relative susceptibilities to various diseases and other disorders.” Unfortunately, tests of this hypothesis were limited to proteins for which common functional variation could be easily assayed (primarily a few enzymes and determinants of blood group antigens).
The advent of gene cloning and sequencing substantially lowered this technical hurdle. It became possible to easily detect DNA variants in a given gene. The first genetic variants tested were usually restriction fragment length polymorphisms (RFLPs), but with the development of the polymerase chain reaction (PCR) and other improvements in technology, microsatellites, variable number tandem repeats (VNTRs), insertion/deletion polymorphisms, and single nucleotide polymorphisms (SNPs) could all be analyzed.
By determining the genotype of these variants in individuals with disease and in unaffected controls, these polymorphisms could be tested for association with susceptibility to a variety of diseases. Such studies, called “association studies,” have usually used a case-control design (although family-based designs have also been used; see below). In this design, the frequencies of the alleles or genotypes at the site of interest are compared in populations of cases and controls; a higher frequency in cases is taken as evidence that the allele or genotype is associated with increased risk of disease. The usual conclusion of such studies is that the polymorphism being tested either affects risk of disease directly or is a marker for some nearby genetic variant that affects risk of disease.
These association studies were further facilitated by the increasingly rapid discovery of common polymorphisms in genes, accomplished by resequencing the same stretch of DNA in multiple individuals. One of the goals of the human genome project has been to identify large numbers of SNPs; indeed, the number of SNPs in public databases is now well over 1,000,000.5 As we describe below, association studies have already identified over 600 potential associations between common genetic variants and susceptibility to common disease. As the availability of known polymorphisms skyrockets, so too will the number of reported associations. It is, therefore, critical to have a framework in place by which one can evaluate and interpret these associations.
The purpose of this publication is to list and put into perspective many of the examples of associations in the recent literature, thereby providing an interim picture of this exciting and rapidly developing field. In addition, we will examine in detail two illustrative examples: (1) the association between deep venous thrombosis and factor V Leiden, a common polymorphism in the gene encoding clotting factor V, and (2) the association between various diseases and a common polymorphism in MTHFR, the gene encoding methylene tetrahydrofolate reductase. Finally, we will suggest some guidelines for the analysis of association studies, because proper evaluation of these associations is critical both to understanding the genetics of common disease and to informing recent discussions regarding screening for common genetic disease.
MATERIALS AND METHODS
We performed two independent reviews of the literature from 1986 through 2000 to identify published significant associations between common diseases or dichotomous traits and common polymorphisms in or near genes (sites of genetic variation in which the minor allele frequency is at least 1%). We excluded monogenic disorders, because linkage analysis and positional cloning methods have been highly successful in identifying the alleles responsible for these diseases. Because of the large amount of prior literature, we also did not consider polymorphisms in HLA or blood group antigens, even though there are many robust associations between variation at these loci and disease. For simplicity, we have only included associations between variation at a single locus and susceptibility to disease in the entire population under study in the publication. In particular, we have not included associations between pairs of loci and susceptibility to disease nor associations between a polymorphism and susceptibility to disease in a subgroup of patients (such as smokers or those receiving hormone replacement therapy). Thereby we have explicitly ignored reports of gene-gene and gene-environment interactions, even though some of these interactions may well be of great biologic and clinical interest. Finally, we have not listed associations with substance abuse (where phenotype definition is often murky), associations between polymorphisms and variation in laboratory findings (such as serum calcium levels), or associations with other quantitative, continuous traits (as opposed to dichotomous traits). Associations were considered significant if the nominal P value was < 0.05 or if the 95% confidence intervals for relative risk excluded 1.00.
REVIEW OF THE ASSOCIATION STUDY LITERATURE
We identified 268 genes that contain polymorphisms reported to be associated with 1 of 133 common diseases or dichotomous traits. In total, these 268 genes accounted for 603 different gene-disease associations. These associations are listed in Table 1, grouped according to the trait or disease under study. As seen in Figure 1, the number of new genes associated with diseases or traits has risen more or less steadily from 1993 to 2000. The temporary drop-off in 1999 and early 2000 likely reflects an emphasis on testing newly identified polymorphisms in previously studied genes (data not shown). Examination of Table 1 also shows that many genes have been associated with several different diseases; for example, polymorphisms in TNF, the gene encoding tumor necrosis factor alpha, have been associated with 20 different diseases or traits, whereas variants in ACE (encoding angiotensin converting enzyme), VDR (encoding the vitamin D receptor), and MTHFR (encoding methylene tetrahydrofolate reductase) have each been associated with over a dozen different diseases or traits (see also supplementary Table 1). As illustrative examples, we examine in more detail two of the associations in Table 1: the association of F5 (clotting factor V) and deep venous thrombosis, and the association between MTHFR and a variety of diseases.
The original report of an association between F5 and deep venous thrombosis grew out of observations that resistance to activated protein C, a biochemically defined phenotype, was associated with markedly increased risk of deep venous thrombosis.6 In an elegant study, the molecular basis of activated protein C resistance was shown to be a single nucleotide polymorphism in F5 encoding an arginine to glutamine change in codon 506 (Factor V Leiden; see Bertina et al.7). This change occurs at one of the protein C cleavage sites, thereby preventing inactivation of factor V by activated protein C and leading to a hypercoagulable state.8 Subsequent studies of this polymorphism have repeatedly demonstrated association with susceptibility to deep venous thrombosis, with P values often at or below 10−4 in individual studies (for example, Salomon et al.9). These studies were performed in several different populations, although the range of populations available for study is limited by the fact that Factor V Leiden is uncommon in non-Caucasian populations.10 Thus this association is extremely robust in addition to having high biologic plausibility.
By contrast, associations involving common variation in MTHFR have not been as reproducible. A common thermolabile variant of methylene tetrahydrofolate reductase was first described in 1991. Thermolability of enzyme activity is inherited as a recessive trait11 and was eventually shown to be due to homozygosity for the “T” allele at a C/T polymorphism in nucleotide 677 (causing an alanine to valine change, see Frosst et al.12). Unlike the rare, more severe mutations in MTHFR which cause homocystinuria, the variant was not associated with neurologic deficits. However, thermolability of enzyme activity was observed to be associated with altered homocysteine levels and risk of coronary artery disease,11 findings that were confirmed in at least one subsequent study that looked at nucleotide 677 (see Gallagher et al. and Kluijtmans et al.13,14). Folate metabolism and homocysteine levels are connected with several clinical disorders, including coronary artery disease, deep venous thrombosis, neural tube defects, and cancer (see Gailey and Gregory15 for review); the thermolabile variant has been associated in different studies with increased risk of each of these diseases.13,14,16–20 However, despite the biologic plausibility of these associations, none have been reproducibly observed across many studies (for example, Ma et al.21–23).
If all of the associations listed in Table 1 could be replicated as consistently as factor V Leiden and deep venous thrombosis, this list would represent a significant understanding of the etiologies of most of the major human diseases. However, genetic associations more often behave like those seen with MTHFR: they are not consistently reproducible. To determine what fraction of the associations in Table 1 were robust, we first identified those associations for which an assessment of reproducibility could be made. These 166 associations (those for which we could find and review at least three separate publications) are listed in Table 2. Where more than one polymorphism in a gene was studied, the polymorphisms were treated separately. Although a significant effort was made to be complete, there are undoubtedly some well-studied associations that are not listed in Table 2. Nevertheless, we believe that this list is a reasonably accurate representation of the state of published association studies between polymorphisms and common genetic disease.
We reviewed the 166 associations in Table 2 to determine whether other studies of the same polymorphism and disease also reached statistical significance. Only six associations were reproduced at a high level of consistency (statistical significance was achieved in 75% or more of all identified studies). These six associations are listed in Table 3. The possibility of publication bias and consequent omission of “negative” studies means that six is actually an upper limit for the number of consistently reproducible associations. Of the associations in Table 3, the most reproducible was the association of ApoE4 and Alzheimer's disease, for which dozens of reports reach statistical significance. It should be noted, however, that the association is most robust in Caucasians (all identified reports achieved statistical significance); for other ethnic groups (Africans, African-Americans, and Hispanics), the association is sometimes more difficult to demonstrate.24–26
What could be the cause of the irreproducibility that characterizes the vast majority of association studies? One possibility is that the original observations represent statistical fluctuations (type I error). If this were the case, one would predict that only 5% of subsequent studies would also reach statistical significance with P < 0.05, and most associations would never be observed again. However, of the 166 associations listed in Table 2, at least 97 were observed again, many of them multiple times. Thus in the absence of a massive publication bias (selective publication of positive results with numerous negative studies remaining unpublished), statistical fluctuation is unlikely to explain all of the initial positive reports in Table 2.
Other possible causes of false-positive association studies have been previously identified and include ethnic admixture resulting in population stratification, variable linkage disequilibrium between the polymorphism being studied and the true causal variant, and population-specific gene-gene or gene-environment interactions.27–30 Each of these issues is addressed briefly in turn below, and possible remedies are offered. Finally, we examine the possibility that weak genetic effects combined with underpowered studies lead to significant numbers of falsely negative reports.
Most association studies have a case-control study design, in which allele or genotype frequencies in patients are compared with frequencies in an unaffected control population (Fig. 2 a). This study design is subject to population stratification due to ethnic admixture, which occurs when the cases and controls are unintentionally drawn from two or more ethnic groups or subgroups. If one of these subgroups has a higher disease prevalence than the others, stratification occurs, because that subgroup will be overrepresented in the cases and underrepresented in the controls. Any polymorphism that genetically marks the high-risk subgroup (i.e., is found by chance at a higher frequency in that subgroup), therefore, will appear to be associated with disease (Fig. 2 b) and will likely be a false positive. Interestingly, the frequencies of several of the alleles in Table 2 vary substantially between populations, consistent with the possibility of false associations due to ethnic admixture. It should be noted that well-defined subgroups are not necessary to observe stratification; stratification can also occur in a single admixed population where the individuals have varying degrees of genetic contributions from two or more ethnic groups. Even apparently homogeneous, isolated populations (such as Iceland) are in theory susceptible to admixture if there have been multiple distinct waves of migration from different source populations (e.g., Celtic and Norse, in the case of Iceland).
What steps can be taken to prevent false-positive associations due to population stratification? Currently, two solutions can be attempted. First, one can use family-based studies such as the transmission disequilibrium test.31 This method, abbreviated TDT, requires affected offspring and their parents to test an allele for association with disease; the frequency with which heterozygous parents transmit that allele to offspring is then determined. This frequency is compared with the Mendelian expectation of 50:50 transmission of the allele. TDT (like other family-based methods) is immune to false-positives from ethnic admixture.31 Disadvantages of the TDT are that family-based samples are often difficult to collect and that 50% more genotyping is required than in case-control studies to achieve similar power (the exact loss of power depends on the underlying genetic model). Another possibility is to study multiple case-control populations, each from different ethnic groups, and require that an association be seen in each population. Finally, an approach to detect and correct for stratification has been proposed: by typing several dozen random markers, one can empirically determine the degree of stratification in a case control study.32–34 If significant stratification is detected, one can use these markers to more carefully match cases and controls to remove the effects of stratification.35 There is some debate as to whether stratification is a significant problem; some authors believe that even minimal ethnic matching of cases and controls is adequate to prevent stratification.36 However, there are as yet no empirical data that address the degree of stratification found in a typical association study.
Failure of replication can also occur if the polymorphism being tested is not itself the causal variant but is rather in linkage disequilibrium with the causal variant. Linkage disequilibrium, in which nearby variants are correlated with each other more often than expected by chance, depends heavily on population history and on the genetic make-up of the founders of that population. If all examples of a particular stretch of DNA in a population derive from a recent common ancestor, there will have been few opportunities for recombination events to separate variants within that stretch of DNA and the variants will often be inherited together throughout the population. If, in a different population, the time since a common ancestor is longer, more recombination events will have occurred, disrupting linkage disequilibrium in the region. Furthermore, the particular arrangement of variants in the founders of a population will determine which variants are inherited together. Thus, it is possible that a polymorphism will be in linkage disequilibrium with a nearby disease allele in one population but not in another, leading to variable results of association studies. For example, many of the associations with TNF in Table 1 might reflect associations with nearby HLA loci (HLA is a region with strong linkage disequilibrium over large distances). To explore this possibility, positive associations should be followed up by testing adjacent markers (both individually and as multi-marker haplotypes). If linkage disequilibrium is present (and particularly if any of the haplotypes or adjacent markers show stronger association), the possibility exists that the original marker tested is not the causal allele, and further studies of the region are warranted. Although it should be possible to exhaustively test modest sized regions of linkage disequilibrium, special circumstances (e.g., recently admixed populations) may in theory give rise to correlation between markers at much greater distances.
GENE-GENE AND GENE-ENVIRONMENT INTERACTIONS
Another potential source of variable findings is gene-gene or gene-environment interactions that differ between populations. For example, if the effect of a variant were only manifest in populations with a particular genetic or environmental background, then association would only be seen in populations or subgroups with the appropriate genetic or environmental characteristics. This explanation is commonly invoked to explain differing results of association studies but is less frequently supported by direct evidence. A further problem arises when considering gene-gene or gene-environment interactions: when combinations of alleles and/or environmental factors are studied, P values are rarely corrected for the number of tests reported (much less the number of tests actually performed). Such “nominally” significant results must be considered to be the product of hypothesis generation rather than hypothesis testing and, therefore, require replication. Perhaps the best possible method of demonstrating that a gene-environment interaction is likely to be correct (and not a statistical fluctuation expected when exploring numerous hypotheses) is to divide the study population randomly into two parts and require that any findings be observed in both parts of the study. Sample sizes need to be increased slightly to maintain power, but the ability to generate and then test hypotheses in the same sample would seem to outweigh this consideration. Otherwise, one requires a replication population that is exactly matched for environmental and genetic background, an extremely unlikely scenario.
WEAK GENETIC EFFECTS AND LACK OF POWER
Finally, associations can be real but nonetheless not reproducible if the underlying genetic effect is weak. If the subsequent studies are small in size, they will be underpowered to reliably detect weak effects and, therefore, fail to achieve statistical significance. This difficulty is heightened by the “jackpot” effect, in which the first group to publish a significant association involving a weak locus is more likely to have overestimated than underestimated the true effect of the polymorphism. This phenomenon occurs because each study imprecisely estimates the strength of the effect (due to sampling variation). Because a weak effect would in most cases not provide a statistically significant finding in a typically sized study (a few hundred cases and controls), the first published study that does manage to achieve statistical significance is almost certain to have overestimated the true effect of the variant being tested. Subsequent studies thus need to include much larger numbers of patients to achieve statistical significance. In particular, failure to observe the magnitude of effect seen in the first study should not be taken as a repudiation of the association. We observed this phenomenon for the association of type 2 diabetes and a Pro12Ala polymorphism in the PPARG gene, where an initial study estimated the effect on diabetes risk to be threefold,37 but subsequent studies observed very modest risks that usually did not achieve statistical significance.38–42 We tested the variant in several large populations and found that the effect on diabetes risk was modest (1.25-fold) but significant (P = 0.002 in our data alone29). Indeed, all of the previous studies, both positive and negative, were consistent with this 1.25-fold effect, and two subsequent large studies confirmed this association.43,44 Because many alleles may have similarly weak genetic effects, large studies and/or meta-analyses of multiple studies will often be required to determine whether genetic associations between polymorphisms and disease are significant.
How does one tell whether reported associations between polymorphisms and disease are real? Reasonable criteria for declaring association have been proposed, including low P values, replication in multiple samples, and avoidance of population stratification (such as by using family-based controls28). However, most studies do not meet these criteria, and multiple studies of an association are usually inconsistent. In these cases, meta-analysis of all published studies may guide interpretation, and we strongly advocate that any publication of an association study (whether negative or positive) be accompanied by a meta-analysis of all similar studies. Accordingly, individual researchers should also publish or make easily available sufficient information to facilitate future meta-analysis, including relevant genotype and phenotype data. Publication bias may present a major challenge to such analyses, because the omission of small negative studies will bias the pooled data toward a positive result. In this regard, we advocate a mechanism for storage and dissemination of all association data (published or not), perhaps in a widely accepted and curated Web site and/or in brief “negative results” sections of specialty journals. Until complete meta-analyses can be performed using data from multiple large studies, we will be left with a scenario in which the majority of reported associations are in genetic purgatory, neither convincingly confirmed or refuted, awaiting future judgment.
Much of the interest surrounding genetic association studies centers on the potential clinical application of polymorphisms that serve as markers for disease. In particular, it has been proposed that these markers can both serve as predictors of disease and as a means to tailor treatment of disease. Although this scenario may well become reality, the current irreproducibility of most studies should raise a loud cautionary alarm. Certainly, clinical applications of genetic associations should not be considered until the degree of certainty far exceeds the level currently achieved for the vast majority of such associations. Furthermore, even if an association is supported by extremely convincing evidence, screening patients is only appropriate if determining an individual's genotype would allow a clinically proven beneficial intervention that outweighs the risk of performing the test. Genetic tests also give rise to ethical considerations, because of the implication for family members, the potential for discrimination, the immutability of genetic risk factors, and the predictive nature of such tests. (Although, given the probable modest effects of any particular genetic variant, most genetic tests are likely to be much less predictive of future health than widely used screens such as blood pressure and cholesterol measurements.) Societal consensus and legislative solutions addressing these ethical concerns are needed before such testing enters widespread clinical practice.
Because of the scientific and ethical uncertainties, a “DNA chip” that can determine crucial genotypes and accurately predict future health is unlikely to become a widespread and useful screening tool in the near future, even if concerns regarding reproducibility can be resolved. Rather, the most likely short-term benefit from genetic association studies will be a better understanding of disease pathogenesis, which will hopefully lead in turn to novel and better treatments and/or more tailored drug therapy. If genetic association studies can provide these sorts of advances, they will have proven a valuable resource in the struggle to understand and treat common disease.
Lander ES, Schork NJ . Genetic dissection of complex traits. Science 1994; 265: 2037–2048.
Chakravarti A . Population genetics–making sense out of sequence. Nat Genet 1999; 21: 56–60.
Risch NJ . Searching for genetic determinants in the new millennium. Nature 2000; 405: 847–856.
Harris H . The principles of human biochemical genetics. Amsterdam: North-Holland Publishing Company, 1975.
Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, Hunt SE, Cole CG, Coggill PC, Rice CM, Ning Z, Rogers J, Bentley DR, Kwok PY, Mardis ER, Yeh RT, Schultz B, Cook L, Davenport R, Dante M, Fulton L, Hillier L, Waterston RH, McPherson JD, Gilman B, Schaffner S, Van Etten WJ, Reich D, Higgins J, Daly MJ, Blumenstiel B, Baldwin J, Stange-Thomann N, Zody MC, Linton L, Lander ES, Atshuler D . A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001; 409: 928–933.
Dahlback B, Carlsson M, Svensson PJ . Familial thrombophilia due to a previously unrecognized mechanism characterized by poor anticoagulant response to activated protein C: prediction of a cofactor to activated protein C. Proc Natl Acad Sci U S A 1993; 90: 1004–1008.
Bertina RM, Koeleman BP, Koster T, Rosendaal FR, Dirven RJ, de Ronde H, van der Velden PA, Reitsma PH . Mutation in blood coagulation factor V associated with resistance to activated protein C. Nature 1994; 369: 64–67.
Aparicio C, Dahlback B . Molecular mechanisms of activated protein C resistance. Properties of factor V isolated from an individual with homozygosity for the Arg506 to Gln mutation in the factor V gene. Biochem J 1996; 313: 467–472.
Salomon O, Steinberg DM, Zivelin A, Gitel S, Dardik R, Rosenberg N, Berliner S, Inbal A, Many A, Lubetsky A, Varon D, Martinowitz U, Seligsohn U . Single and combined prothrombotic factors in patients with idiopathic venous thromboembolism: prevalence and risk assessment. Arterioscler Thromb Vasc Biol 1999; 19: 511–518.
Dilley A, Austin H, Hooper WC, El-Jamil M, Whitsett C, Wenger NK, Benson J, Evatt B . Prevalence of the prothrombin 20210 G-to-A variant in blacks: infants, patients with venous thrombosis, patients with myocardial infarction, and control subjects. J Lab Clin Med 1998; 132: 452–455.
Kang SS, Wong PW, Susmano A, Sora J, Norusis M, Ruggie N . Thermolabile methylenetetrahydrofolate reductase: an inherited risk factor for coronary artery disease. Am J Hum Genet 1991; 48: 536–545.
Frosst P, Blom HJ, Milos R, Goyette P, Sheppard CA, Matthews RG, Boers GJ, den Heijer M, Kluijtmans LA, van den Heuvel LP, et al. A candidate genetic risk factor for vascular disease: a common mutation in methylenetetrahydrofolate reductase. Nat Genet 1995; 10: 111–113.
Gallagher PM, Meleady R, Shields DC, Tan KS, McMaster D, Rozen R, Evans A, Graham IM, Whitehead AS . Homocysteine and risk of premature coronary heart disease. Evidence for a common gene mutation. Circulation 1996; 94: 2154–2158.
Kluijtmans LA, van den Heuvel LP, Boers GH, Frosst P, Stevens EM, van Oost BA, den Heijer M, Trijbels FJ, Rozen R, Blom HJ . Molecular genetic analysis in mild hyperhomocystinemia: a common mutation in the methylenetetrahydrofolate reductase gene is a genetic risk factor for cardiovascular disease. Am J Hum Genet 1996; 58: 35–41.
Bailey LB, Gregory JF . Polymorphisms of methylenetetrahydrofolate reductase and other enzymes: metabolic significance, risks and impact on folate requirement. J Nutr 1999; 129: 919–922.
van der Put NM, Steegers-Theunissen RP, Frosst P, Trijbels FJ, Eskes TK, van den Heuvel LP, Mariman EC, den Heyer M, Rozen R, Blom HJ . Mutated methylenetetrahydrofolate reductase as a risk factor for spina bifida. Lancet 1995; 346: 1070–1071.
Whitehead AS, Gallagher P, Mills JL, Kirke PN, Burke H, Molloy AM, Weir DG, Shields DC, Scott JM . A genetic defect in 5,10 methylenetetrahydrofolate reductase in neural tube defects. Q J Med 1995; 88: 763–766.
Ma J, Stampfer MJ, Giovannucci E, Artigas C, Hunter DJ, Fuchs C, Willett WC, Selhub J, Hennekens CH, Rozen R . Methylenetetrahydrofolate reductase polymorphism, dietary interactions, and risk of colorectal cancer. Cancer Res 1997; 57: 1098–1102.
Margaglione M, D'Andrea G, d'Addedda M, Giuliani N, Cappucci G, Iannaccone L, Vecchione G, Grandone E, Brancaccio V, Di Minno G . The methylenetetrahydrofolate reductase TT677 genotype is associated with venous thrombosis independently of the coexistence of the FV Leiden and the prothrombin A20210 mutation. Thromb Haemost 1998; 79: 907–911.
Skibola CF, Smith MT, Kane E, Roman E, Rollinson S, Cartwright RA, Morgan G . Polymorphisms in the methylenetetrahydrofolate reductase gene are associated with susceptibility to acute leukemia in adults. Proc Natl Acad Sci U S A 1999; 96: 12810–12815.
Ma J, Stampfer MJ, Hennekens CH, Frosst P, Selhub J, Horsford J, Malinow MR, Willett WC, Rozen R . Methylenetetrahydrofolate reductase polymorphism, plasma folate, homocysteine, and risk of myocardial infarction in US physicians. Circulation 1996; 94: 2410–2416.
Kluijtmans LA, den Heijer M, Reitsma PH, Heil SG, Blom HJ, Rosendaal FR . Thermolabile methylenetetrahydrofolate reductase and factor V Leiden in the risk of deep-vein thrombosis. Thromb Haemost 1998; 79: 254–258.
Morrison K, Papapetrou C, Hol FA, Mariman EC, Lynch SA, Burn J, Edwards YH . Susceptibility to spina bifida; an association study of five candidate genes. Ann Hum Genet 1998; 62: 379–396.
Osuntokun BO, Sahota A, Ogunniyi AO, Gureje O, Baiyewu O, Adeyinka A, Oluwole SO, Komolafe O, Hall KS, Unverzagt FW, et al. Lack of an association between apolipoprotein E epsilon 4 and Alzheimer's disease in elderly Nigerians. Ann Neurol 1995; 38: 463–465.
Farrer LA, Cupples LA, Haines JL, Hyman B, Kukull WA, Mayeux R, Myers RH, Pericak-Vance MA, Risch N, van Duijn CM . Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease. A meta-analysis. APOE and Alzheimer Disease Meta Analysis Consortium. JAMA 1997; 278: 1349–1356.
Tang MX, Stern Y, Marder K, Bell K, Gurland B, Lantigua R, Andrews H, Feng L, Tycko B, Mayeux R . The APOE-epsilon4 allele and the risk of Alzheimer disease among African Americans, whites, and Hispanics. JAMA 1998; 279: 751–755.
Altshuler D, Kruglyak L, Lander E . Genetic polymorphisms and disease. N Engl J Med 1998; 338: 1626.
Freely associating. Nat Genet 1999; 22: 1–2.
Altshuler D, Hirschhorn JN, Klannemark M, Lindgren CM, Vohl MC, Nemesh J, Lane CR, Schaffner SF, Bolk S, Brewer C, Tuomi T, Gaudet D, Hudson TJ, Daly M, Groop L, Lander ES . The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat Genet 2000; 26: 76–80.
Cardon LR, Bell JI . Association study designs for complex diseases. Nat Rev Genet 2001; 2: 91–99.
Spielman RS, McGinnis RE, Ewens WJ . Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993; 52: 506–516.
Pritchard JK, Rosenberg NA . Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 1999; 65: 220–228.
Devlin B, Roeder K . Genomic control for association studies. Biometrics 1999; 55: 997–1004.
Reich DE, Goldstein DB . Detecting association in a case-control study while correcting for population stratification. Genet Epidemiol 2001; 20: 4–16.
Pritchard JK, Stephens M, Rosenberg NA, Donnelly P . Association mapping in structured populations. Am J Hum Genet 2000; 67: 170–181.
Morton NE, Collins A . Tests and estimates of allelic association in complex inheritance. Proc Natl Acad Sci U S A 1998; 95: 11389–11393.
Deeb SS, Fajas L, Nemoto M, Pihlajamaki J, Mykkanen L, Kuusisto J, Laakso M, Fujimoto W, Auwerx J . A Pro12Ala substitution in PPARgamma2 associated with decreased receptor activity, lower body mass index and improved insulin sensitivity. Nat Genet 1998; 20: 284–287.
Mancini FP, Vaccaro O, Sabatino L, Tufano A, Rivellese AA, Riccardi G, Colantuoni V . Pro12Ala substitution in the peroxisome proliferator-activated receptor- gamma2 is not associated with type 2 diabetes. Diabetes 1999; 48: 1466–1468.
Ringel J, Engeli S, Distler A, Sharma AM . Pro12Ala missense mutation of the peroxisome proliferator activated receptor gamma and diabetes mellitus. Biochem Biophys Res Commun 1999; 254: 450–453.
Clement K, Hercberg S, Passinge B, Galan P, Varroud-Vial M, Shuldiner AR, Beamer BA, Charpentier G, Guy-Grand B, Froguel P, Vaisse C . The Pro115Gln and Pro12Ala PPAR gamma gene mutations in obesity and type 2 diabetes. Int J Obes Relat Metab Disord 2000; 24: 391–393.
Hara K, Okada T, Tobe K, Yasuda K, Mori Y, Kadowaki H, Hagura R, Akanuma Y, Kimura S, Ito C, Kadowaki T . The Pro12Ala polymorphism in PPAR gamma2 may confer resistance to type 2 diabetes. Biochem Biophys Res Commun 2000; 271: 212–216.
Meirhaeghe A, Fajas L, Helbecque N, Cottel D, Auwerx J, Deeb SS, Amouyel P . Impact of the peroxisome proliferator activated receptor gamma2 Pro12Ala polymorphism on adiposity, lipids and non-insulin-dependent diabetes mellitus. Int J Obes Relat Metab Disord 2000; 24: 195–199.
Douglas JA, Erdos MR, Watanabe RM, Braun A, Johnston CL, Oeth P, Mohlke KL, Valle TT, Ehnholm C, Buchanan TA, Bergman RN, Collins FS, Boehnke M, Tuomilehto J . The peroxisome proliferator-activated receptor-gamma2 Pro12A1a variant: association with type 2 diabetes and trait differences. Diabetes 2001; 50: 886–890.
Mori H, Ikegami H, Kawaguchi Y, Seino S, Yokoi N, Takeda J, Inoue I, Seino Y, Yasuda K, Hanafusa T, Yamagata K, Awata T, Kadowaki T, Hara K, Yamada N, Gotoda T, Iwasaki N, Iwamoto Y, Sanke T, Nanjo K, Oka Y, Matsutani A, Maeda E, Kasuga M . The Pro12 –>Ala substitution in PPAR-gamma is associated with resistance to development of diabetes in the general population: possible involvement in impairment of insulin secretion in individuals with type 2 diabetes. Diabetes 2001; 50: 891–894.
J.N.H. is a recipient of a Postdoctoral Fellowship for Physicians from the Howard Hughes Medical Institute and a Burroughs Wellcome Career Award in the Biomedical Sciences. K.L and E.B. were supported by grants from Affymetrix Inc., Millennium Pharmaceuticals, Inc., and Bristol-Myers Squibb Company to Eric S. Lander, Whitehead/MIT Center for Genome Research, Cambridge, Massachusetts. We thank David Altshuler, Pamela Sklar, Eric Lander, and C. Leigh Pearce for helpful comments, and Delores Gray for assistance in locating manuscripts. Supplementary information (full citations for references 45 to 663 and Supplementary Table 1) can be found at http://www.geneticsinmedicine.org.
Supplementary information (full citations for references 45–663 and Supplementary Table 1) can be found at .
About this article
Cite this article
Hirschhorn, J., Lohmueller, K., Byrne, E. et al. A comprehensive review of genetic association studies. Genet Med 4, 45–61 (2002). https://doi.org/10.1097/00125817-200203000-00002
- human genetics
- association studies
- common disease
This article is cited by
Nature Machine Intelligence (2022)
Evaluating the validity of animal models of mental disorder: from modeling syndromes to modeling endophenotypes
History and Philosophy of the Life Sciences (2022)
Response to Sodium Channel blocking Antiseizure medications and coding polymorphisms of Sodium Channel genes in Taiwanese epilepsy patients
BMC Neurology (2021)
Scientific Reports (2021)
Interleukin 4 gene polymorphism (−589C/T) and the risk of asthma: a meta-analysis and met-regression based on 55 studies
BMC Immunology (2020)