Genetic association analysis is a popular approach for identifying genetic variation that correlates with phenotypic variation, such as susceptibility to complex disease.
Association studies have a chequered history. Many published studies cannot be reproduced, or be substantiated by linkage data.
Genetic association occurs as a result of linkage disequilibrium (LD). But LD levels vary within the genome and between populations, making it difficult to predict the best sample populations for a particular study.
The most popular sampling strategy is the case-control study. Selection of the control population is key to the success of this approach, and small sample sizes or poorly matched controls are sources of error in association studies.
Prospective study designs can avoid the errors of case-control studies, but require large sample sizes. Family-based studies are also useful in overcoming errors due to population stratification.
Multiple testing of the same population, or population subgroups, is another source of error.
With the availability of the human genome sequence, and new methods for genotyping single nucleotide polymorphisms, association studies will become increasingly popular. Applications will include whole-genome screens and regional LD mapping.
More rigorous study design, independent replication of data and careful attention to the effects of multiple testing are among the recommendations that will improve the value of association data in the future.
Assessing the association between DNA variants and disease has been used widely to identify regions of the genome and candidate genes that contribute to disease. However, there are numerous examples of associations that cannot be replicated, which has led to scepticism about the utility of the approach for common conditions. With the discovery of massive numbers of genetic markers and the development of better tools for genotyping, association studies will inevitably proliferate. Now is the time to consider critically the design of such studies, to avoid the mistakes of the past and to maximize their potential to identify new components of disease.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Mullikin, J. C. et al. An SNP map of human chromosome 22. Nature 407, 516–520 (2000).
Altshuler, D. et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407, 513– 516 (2000).
Drews, J. & Ryser, S. The role of innovation in drug development . Nature Biotechnol. 15, 1318– 1319 (1997).
Terwilliger, J. D. & Weiss, K. M. Linkage disequilibrium mapping of complex disease: fantasy or reality? Curr. Opin. Biotechnol. 9, 578–594 ( 1998).
Gambaro, G., Anglani, F. & D'Angelo, A. Association studies of genetic polymorphisms and complex disease. Lancet 355, 308– 111 (2000).
Weiss, K. M. & Terwilliger, J. D. How many diseases does it take to map a gene with SNPs? Nature Genet. 26, 151–157 (2000).This paper is essential reading for anyone undertaking association studies of common characters. The primary aim is to elucidate the difficulties in identifying genetic loci that contribute to complex traits. The literature cited covers some necessary population genetics material.
Risch, N. J. Searching for genetic determinants in the new millennium. Nature 405, 847–856 ( 2000).An excellent summary of current statistical procedures and their comparative strengths and weaknesses for complex trait mapping. Very useful for comparing linkage and association and for distinguishing familial influences on discrete versus quantitative traits.
Schork, N. J., Cardon, L. R. & Xu, X. The future of genetic epidemiology. Trends Genet. 14, 266–272 ( 1998).
Collins, F. Positional cloning moves from perditional to traditional. Nature Genet. 9, 347–350 ( 1995).
Lander, E. S. & Schork, N. J. Genetic dissection of complex traits. Science 265, 2037– 2048 (1994).
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
Jorde, L. B. Linkage disequilibrium and the search for complex disease genes. Genome Res. 10, 1435–1444 (2000).
Xiong, M. & Guo, S. W. Fine-scale genetic mapping based on linkage disequilibrium: theory and applications. Am. J. Hum. Genet. 60, 1513–1531 ( 1997).
Freimer, N. B. et al. Genetic mapping using haplotype, association and linkage methods suggests a locus for severe bipolar disorder (BPI) at 18q22-q23. Nature Genet. 12, 436–441 (1996).
Hastbacka, J. et al. Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nature Genet. 2, 204–211 (1992).This is becoming a classic paper on using disequilibrium/haplotype data to identify disease loci. The trait studied does not reflect the common disease framework of current widespread interest, but the procedures used offer a useful model from which to start.
Collins, A., Lonjou, C. & Morton, N. E. Genetic epidemiology of single-nucleotide polymorphisms . Proc. Natl Acad. Sci. USA 96, 15173– 15177 (1999).One of a series of key papers by these authors who compare disequilibrium measures, evaluate real data patterns to infer genome-wide marker spacing requirements, and combine population genetics principles with those of disease-gene mapping to characterize allelic association.
Eaves, I. A. et al. The genetically isolated populations of finland and sardinia may not be a panacea for linkage disequilibrium mapping of common disease genes. Nature Genet. 25, 320– 323 (2000).
Taillon-Miller, P. et al. Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28. Nature Genet. 25, 324–328 (2000).
Nickerson, D. A. et al. DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene. Nature Genet. 19, 233– 240 (1998).
Clark, A. G. et al. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Hum. Genet. 63, 595–612 (1998).
Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999).
Halushka, M. K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–247 (1999).
Templeton, A. R. et al. Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am. J. Hum. Genet. 66, 69– 83 (2000).
Ott, J. Predicting the range of linkage disequilibrium. Proc. Natl Acad. Sci. USA 97, 2–3 (2000 ).
Chapman, N. H. & Thompson, E. A. Linkage disequilibrium mapping: the role of population history, size, and structure. Adv. Genet. 42, 413–437 (2001).
Fisher, R. A. The rhesus factor: a study in scientific method. Am. Sci. 35, 95–103 (1947).
Tiwari, J. L. & Terasaki, P. I. HLA and Disease Associations (Springer, New York, 1985).
Lander, E. S. Array of hope. Nature Genet. 21, 3– 4 (1999).
Risch, N. & Teng, J. Design and analysis of linkage disequilibrium studies for complex human diseases. Am. J. Hum. Genet. 61, 1707 (1997).
Risch, N. & Teng, J. The relative power of family-based and case–control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. Genome Res. 8, 1273–1288 (1998).
Teng, J. & Risch, N. The relative power of family-based and case–control designs for linkage disequilibrium studies of complex human diseases. II. Individual genotyping. Genome Res. 9, 234–241 (1999).
Keavney, B. Genetic association studies in complex diseases. J. Hum. Hypertens. 14, 361–367 ( 2000).
Keavney, B. et al. Large-scale test of hypothesised associations between the angiotensin-converting-enzyme insertion/deletion polymorphism and myocardial infarction in about 5000 cases and 6000 controls. International Studies of Infarct Survival (ISIS) Collaborators. Lancet 355, 434–442 (2000).The need for association studies to involve thousands of patients is clearly shown by comparing the results of a number of typical, small studies with that of a large-scale, well-controlled design. Reference 33 offers a similar example for non-insulin-dependent diabetes mellitus.
Altshuler, D. et al. The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nature Genet. 26 , 76–80 (2000).
Cambien, F. et al. Deletion polymorphism in the gene for angiotensin-converting enzyme is a potent risk factor for myocardial infarction. Nature 359, 641–644 ( 1992).
Arnheim, N., Strange, C. & Erlich, H. Use of pooled DNA samples to detect linkage disequilibrium of polymorphic restriction fragments and human disease: studies of the HLA class II loci. Proc. Natl Acad. Sci. USA 82, 6970–6974 (1985).
Barcellos, L. F. et al. Association mapping of disease loci, by use of a pooled DNA genomic screen. Am. J. Hum. Genet. 61, 734 –747 (1997).
Daniels, J. et al. A simple method for analyzing microsatellite allele image patterns generated from DNA pools and its application to allelic association studies. Am. J. Hum. Genet. 62, 1189– 1197 (1998).
Shaw, S. H., Carrasquillo, M. M., Kashuk, C., Puffenberger, E. G. & Chakravarti, A. Allele frequency distributions in pooled DNA samples: applications to mapping complex disease genes. Genome Res. 8, 111– 123 (1998).
Kirov, G., Williams, N., Sham, P., Craddock, N. & Owen, M. J. Pooled genotyping of microsatellite markers in parent-offspring trios. Genome Res. 10, 105– 115 (2000).
Falk, C. T. & Rubinstein, P. Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations . Ann. Hum. Genet. 51, 227– 233 (1987).
Spielman, R. S., McGinnis, R. E. & Ewens, W. J. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus. Am. J. Hum. Genet. 52, 506–516 (1993).The TDT test and its immediate predecessors changed the way human genetic studies were conducted throughout the past decade. This is the original paper describing the method.
Spielman, R. S., McGinnis, R. E. & Ewens, W. J. The transmission/disequilibrium test detects cosegregation and linkage. Am. J. Hum. Genet. 54, 559– 560 (1994).
Spielman, R. S. & Ewens, W. J. The TDT and other family-based tests for linkage disequilibrium and association. Am. J. Hum. Genet. 59, 983–989 (1996).
Sham, P. C. & Curtis, D. An extended transmission/disequilibrium test (TDT) for multiallelic marker loci. Ann. Hum. Genet. 59, 323–326 (1995).
Spielman, R. S. & Ewens, W. J. A sibship test for linkage in the presence of association: The sib transmission/disequilibrium test. Am. J. Hum. Genet. 62, 450– 458 (1998).
Curtis, D. Use of siblings as controls in case–control association studies. Ann. Hum. Genet. 61, 319–333 (1997).
Martin, E. R., Kaplan, N. L. & Weir, B. S. Tests for linkage and association in nuclear families . Am. J. Hum. Genet. 61, 439– 448 (1997).
Allison, D. B. Transmission-disequilibrium tests for quantitative traits. Am. J. Hum. Genet. 60, 676–690 (1997).
Rabinowitz, D. A transmission disequilibrium test for quantitative trait loci. Hum. Hered. 47, 342–350 (1997).
Abecasis, G. R., Cardon, L. R. & Cookson, W. O. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66, 279–292 (2000).
Martin, E. R., Monks, S. A., Warren, L. L. & Kaplan, N. L. A test for linkage and association in general pedigrees: The pedigree disequilibrium test. Am. J. Hum. Genet. 67, 146– 154 (2000).
Pritchard, L. E. et al. Analysis of the CD3 gene region and type 1 diabetes: application of fluorescence-based technology to linkage disequilibrium mapping. Hum. Mol. Genet. 4, 197–202 (1995).
Bennett, S. T. & Todd, J. A. Human type 1 diabetes and the insulin gene: Principles of mapping polygenes. Annu. Rev. Genet. 30, 343–370 ( 1996).
Bennett, S. T. et al. Insulin VNTR allele-specific effect in type 1 diabetes depends on identity of untransmitted paternal allele. The IMDIAB Group. Nature Genet. 17, 350–352 (1997).
Merriman, T. R. et al. Transmission of haplotypes of microsatellite markers rather than single marker alleles in the mapping of a putative type 1 diabetes susceptibility gene (IDDM6). Hum. Mol. Genet. 7, 517– 524 (1998).
Eaves, I. A. et al. Transmission ratio distortion at the INS-IGF2 VNTR. Nature Genet. 22, 324–325 (1999).
Lernmark, A. & Ott, J. Sometimes it's hot, sometimes it's not . Nature Genet. 19, 213– 214 (1998).
Goring, H. H. & Terwilliger, J. D. Linkage analysis in the presence of errors IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified. Am. J. Hum. Genet. 66, 1310–1327 (2000).
Morton, N.E. & Collins, A. Tests and estimates of allelic association in complex inheritance. Proc. Natl Acad. Sci. USA 95 , 11389–93 (1998).
Riordan, J. R. et al. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 245, 1066– 1073 (1989).
Rommens, J. M. et al. Identification of the cystic fibrosis gene: chromosome walking and jumping. Science 245, 1059– 1065 (1989).
Kerem, B. et al. Identification of the cystic fibrosis gene: genetic analysis . Science 245, 1073–1080 (1989).
Huntington's Disease Collaborative Research Group. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. Cell 72, 971–983 (1993).
Martin, E. R. et al. SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am. J. Hum. Genet. 67, 383–394 ( 2000).
Martin, E. R. et al. Analysis of association at single nucleotide polymorphisms in the APOE region. Genomics 63, 7– 12 (2000).
Horikawa, Y. et al. Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nature Genet. 26, 163–175 (2000).
Roses, A. D. Pharmacogenetics and the practice of medicine. Nature 405, 857–865 (2000).
Keavney, B. et al. Measured haplotype analysis of the angiotensin-I converting enzyme gene. Hum. Mol. Genet. 7, 1745– 1751 (1998).The ACE locus and ACE phenotype is a model quantitative system. Despite the unusually clear haplotype relationships in this gene and population, the study clearly demonstrates the difficulty in distinguishing which specific variants are responsible for phenotypic variability.
Moffatt, M. F., Traherne, J. A., Abecasis, G. R. & Cookson, W. O. Single nucleotide polymorphism and linkage disequilibrium within the TCR alpha/delta locus. Hum. Mol. Genet. 9, 1011– 1019 (2000).
Abecasis, G. R. et al. Patterns of linkage disequilibrium from three genomic regions . Am. J. Hum. Genet. 68, 191– 197 (2001).
Farrall, M. et al. Fine-mapping of an ancestral recombination breakpoint in DCP1 . Nature Genet. 23, 270– 271 (1999).
Abecasis, G. R., Cookson, W. O. & Cardon, L. R. Pedigree tests of transmission disequilibrium. Eur. J. Hum. Genet. 8, 545–551 (2000).
Todd, J. A. et al. Identification of susceptibility loci for insulin-dependent diabetes mellitus by trans-racial gene mapping. Nature 338, 587–589 (1989).
Mijovic, C. H., Barnett, A. H. & Todd, J. A. Genetics of diabetes. Trans-racial gene mapping studies . Baillieres Clin. Endocrinol. Metab. 5, 321–340 (1991).
Cardon, L. R. & Watkins, H. Waiting for the working draft from the human genome project: A huge achievement, but not of immediate medical use. Br. Med. J. 320, 1221– 1222 (2000).
Kruglyak, L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genet. 22, 139– 144 (1999).Mathematical population genetics modelling is used to simulate background levels of linkage disequilibrium in the genome, indicating that very fine-scale maps are required for disease gene association mapping. Although hotly contested and not always supported by empirical reports, this paper clearly outlines the issues and importance of disequilibrium levels in the genome.
Collins, A. & Morton, N. E. Mapping a disease locus by allelic association. Proc. Natl Acad. Sci. USA 95, 1741–1745 (1998).
Cox, N. J. et al. Loci on chromosomes 2 (NIDDM1) and 15 interact to increase susceptibility to diabetes in Mexican Americans. Nature Genet. 21, 213–215 ( 1999).
Risch, N. Evolving methods in genetic epidemiology. 2. Genetic linkage from an epidemiologic perspective. Epidemiol. Rev. 19, 24– 32 (1997).
Potter, J. D. At the interfaces of epidemiology, genetics and genomics. Nature Rev. Genet. 2, 142–147 ( 2001).
Khoury, M. J., Beaty, T. H. & Cohen, B. H. Fundamentals of Genetic Epidemiology (Oxford Univ. Press, Oxford, 1993).
Huttley, G. A., Smith, M. W., Carrington, M. & O'Brien, S. J. A scan for linkage disequilibrium across the human genome. Genetics 152, 1711–1722 ( 1999).
Goddard, K. A., Hopkins, P. J., Hall, J. M. & Witte, J. S. Linkage disequilibrium and allele-frequency distributions for 114 single-nucleotide polymorphisms in five populations. Am. J. Hum. Genet. 66, 216–234 (2000).
Majewski, J. & Ott, J. GT repeats are associated with recombination on human chromosome 22. Genome Res. 10, 1108–1114 (2000).
Abbott, A. Manhattan versus Reykjavik. Nature 406, 340–342 (2000).
Borecki, I. B. & Suarez, B. K. Linkage and association: basic concepts. Adv. Genet. 42, 45–66 (2001).
Slatkin, M. Linkage disequilibrium in growing and stable populations. Genetics 137, 331–336 ( 1994).
Hartl, D. L. & Clark, A. G. Principles of Population Genetics (Sinauer Associates, Sunderland, MA, 1997).
Pritchard, J. K. & Rosenberg, N. A. Use of unlinked genetic markers to detect population stratification in association studies . Am. J. Hum. Genet. 65, 220– 228 (1999).This paper describes the use of unlinked genetic markers to detect population stratification, with minimal mathematical complexity. The key issues of marker spacing and informativeness are evaluated in detail. Reference 94 should be read in follow-up of this paper to see how stratification can be accounted for when it is present.
Devlin, B. & Roeder, K. Genomic control for association studies . Biometrics 55, 997–1004 (1999).
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945– 959 (2000).
Pritchard, J. K., Stephens, M., Rosenberg, N. A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 ( 2000).
Bacanu, S. A., Devlin, B. & Roeder, K. The power of genomic control. Am. J. Hum. Genet. 66, 1933–1944 ( 2000).
Witte, J. S., Elston, R. C. & Schork, N. J. Genetic dissection of complex traits. Nature Genet. 12, 355–358 (1996).
This work was supported by the Wellcome Trust and in part by a grant from the NIH (to L.R.C.). We wish to thank Dr Joe Terwilliger for critical review of this manuscript.
The probability of correctly rejecting the null hypothesis when it is truly false. For association studies, the power can be considered as the probability of correctly detecting a genuine association.
- GENETIC DRIFT
The random fluctuation in allele frequencies as genes are transmitted from one generation to the next.
- POPULATION ADMIXTURE
A population in which multiple subgroups are included. Admixture often refers to intermarriage/reproduction from different groups of individuals, but most simply is used to denote a population of subgroups having different allele frequencies (see population stratification).
- PROSPECTIVE COHORT
Longitudinal study of individuals initially assessed for exposure to certain risk factors and then followed over time to evaluate the progression towards specific outcomes (often disease).
- LOCUS HETEROGENEITY
The appearance of phenotypically similar characteristics resulting from mutations at different genetic loci. Differences in effect size or in replication between studies and samples are often ascribed to different loci leading to the same disease.
- POPULATION STRATIFICATION
The presence of multiple subgroups with different allele frequencies within a population. The different underlying allele frequencies in sampled subgroups might be independent of the disease within each group, and they can lead to erroneous conclusions of linkage disequilibrium or disease relevance.
- TYPE I ERROR
The probability of rejecting the null hypothesis when it is true. For association studies, Type I errors are manifest as false-positive reports of phenotype–genotype correlation.
- RISK RATIO
A measure of association effect reflecting the probability of disease in people with a particular allele or genotype versus the probability of disease in those who do not have the particular genotype.
About this article
Circulation: Cardiovascular Quality and Outcomes (2021)
Genetic diversity of Prunus sibirica L. superior accessions based on the SSR markers developed using restriction-site associated DNA sequencing
Genetic Resources and Crop Evolution (2021)
MiRNA Polymorphisms and Hepatocellular Carcinoma Susceptibility: A Systematic Review and Network Meta-Analysis
Frontiers in Oncology (2021)
Genome-Wide Association Study Reveals Marker–Trait Associations for Early Vegetative Stage Salinity Tolerance in Rice
Briefings in Bioinformatics (2021)