Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Association studies for finding cancer-susceptibility genetic variants

Key Points

  • The polygenic model for cancer susceptibility indicates that much of the inherited risk of cancer is due to multiple risk alleles, each with a low to moderate risk. The number of such alleles for any specific cancer is unknown, but might be in the hundreds or thousands.

  • Although linkage studies have been highly successful in mapping the genes that underlie monogenic disorders, these studies are of limited use for investigating predisposition to polygenic disease, such as cancer. Genetic-association studies — or case–control studies — provide an efficient design for identifying common genetic variants that confer modest disease risks.

  • Few convincing cancer-susceptibility alleles have been identified so far using the genetic-association study design. The limited success of these studies can be attributed mainly to the use of small study sizes — which provide insufficient statistical power and give a high rate of false positives — and limitations in the selection of candidate genes.

  • The rapid acquisition of data on the occurrence of common single-nucleotide polymorphisms (SNPs) has made it possible to test for the association of a candidate gene or region with disease using a tagging-SNP approach.

  • Several approaches can be used to increase the efficiency of candidate-gene association studies, such as improving the selection of candidate genes that are likely to be associated with cancer predisposition and enriching for genetic susceptibility by studying families with a history of cancer.

  • A combination of cheaper genotyping technologies with efficient study design will make empirical, whole-genome studies a feasible prospect in the near future.

  • Elucidating how multiple susceptibility alleles interact with each other and with lifestyle and environmental factors will be a key future challenge for the molecular and genetic epidemiology of cancer predisposition.


Cancer is the result of complex interactions between inherited and environmental factors. Known genes account for a small proportion of the heritability of cancer, and it is likely that many genes with modest effects are yet to be found. Genetic-association studies have been widely used in the search for such genes, but success has been limited so far. Increased knowledge of the function of genes and the architecture of human genetic variation combined with new genotyping technologies herald a new era of gene mapping by association.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: The number of alleles required to explain the excess familial risk of a typical common cancer according to alleles with different frequencies and conferring different risks.
Figure 2: The stages in the design of an association study for cancer-susceptibility genes.


  1. 1

    Houlston, R. S. & Peto, J. in Genetic predisposition to cancer (eds Eeles, R. A., Ponder, B. A. J., Easton, D. F. & Horwich, A.) 208–226 (Chapman & Hall, London, 1996).

    Book  Google Scholar 

  2. 2

    Lichtenstein, P. et al. Environmental and heritable factors in the causation of cancer — analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med. 343, 78–85 (2000). A landmark paper reporting the heritability of the common cancers based on data from over 40,000 twin pairs from Scandinavia.

    CAS  Article  Google Scholar 

  3. 3

    Easton, D. F. How many more breast cancer predisposition genes are there. Breast Cancer Res. 1, 14–17 (1999).

    CAS  Article  Google Scholar 

  4. 4

    Antoniou, A. C. et al. A comprehensive model for familial breast cancer incorporating BRCA1, BRCA2 and other genes. Br. J. Cancer 86, 76–83 (2002).

    CAS  Article  Google Scholar 

  5. 5

    Risch, N. Searching for genetic determinants in the new millenium. Nature 405, 847–856 (2000). An excellent description of the strengths and weaknesses of different methods for gene mapping in complex diseases.

    CAS  Article  Google Scholar 

  6. 6

    Cardon, L. R. & Bell, J. I. Association study designs for complex diseases. Nature Rev. Genet. 2, 91–99 (2001).

    CAS  Article  Google Scholar 

  7. 7

    Chakravarti, A. Population genetics — making sense out of sequence. Nature Genet. 21, 56–60 (1999).

    CAS  Article  Google Scholar 

  8. 8

    Glober, G. A., Cantrell, E. G., Doll, R. & Peto, R. Interaction between ABO and rhesus blood groups, the site of origin of gastric cancers, and the age and sex of the patient. Gut 12, 570–573 (1971).

    CAS  Article  Google Scholar 

  9. 9

    Hildesheim, A. et al. Association of HLA class I and II alleles and extended haplotypes with nasopharyngeal carcinoma in Taiwan. J. Natl Cancer Inst. 94, 1780–1789 (2002).

    CAS  Article  Google Scholar 

  10. 10

    Engel, L. S. et al. Pooled analysis and meta-analysis of glutathione S-transferase M1 and bladder cancer: a HuGE review. Am. J. Epidemiol. 156, 95–109 (2002).

    Article  Google Scholar 

  11. 11

    Vineis, P. et al. Current smoking, occupation, N-acetyltransferase-2 and bladder cancer: a pooled analysis of genotype-based studies. Cancer Epidemiol. Biomarkers Prev. 10, 1249–1252 (2001).

    CAS  PubMed  Google Scholar 

  12. 12

    Dunning, A. M. et al. A systematic review of genetic polymorphisms and breast cancer risk. Cancer Epidemiol. Biomarkers Prev. 8, 843–854 (1999).

    CAS  PubMed  Google Scholar 

  13. 13

    Gonzalez, C. A., Sala, N. & Capella, G. Genetic susceptibility and gastric cancer risk. Int. J. Cancer 100, 249–260 (2002).

    CAS  Article  Google Scholar 

  14. 14

    Ioannidis, J. P., Ntzani, E. E., Trikalinos, T. A. & Contopoulos-Ioannidis, D. G. Replication validity of genetic association studies. Nature Genet. 29, 306–309 (2001).

    CAS  Article  Google Scholar 

  15. 15

    Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S. & Hirschhorn, J. N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 33, 177–182 (2003).

    CAS  Article  Google Scholar 

  16. 16

    Tabor, H. K., Risch, N. J. & Myers, R. M. Candidate-gene approaches for studying complex genetic traits: practical considerations. Nature Rev. Genet. 3, 391–397 (2002).

    CAS  Article  Google Scholar 

  17. 17

    Dahlman, I. et al. Parameters for reliable results in genetic association studies in common disease. Nature Genet. 30, 149–150 (2002).

    CAS  Article  Google Scholar 

  18. 18

    Colhoun, H. M., McKeigue, P. M. & Davey Smith, G. Problems of reporting genetic associations with complex outcomes. Lancet 361, 865–872 (2003).

    Article  Google Scholar 

  19. 19

    Patil, N. et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719–1723 (2001).

    CAS  Article  Google Scholar 

  20. 20

    Johnson, G. C. et al. Haplotype tagging for the identification of common disease genes. Nature Genet. 29, 233–237 (2001).

    CAS  Article  Google Scholar 

  21. 21

    Gabriel, S. B. et al. The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002).

    CAS  Article  Google Scholar 

  22. 22

    Zhang, K., Calabrese, P., Nordborg, M. & Sun, F. Haplotype block structure and its applications to association studies: power and study designs. Am. J. Hum. Genet. 71, 1386–1394 (2002).

    CAS  Article  Google Scholar 

  23. 23

    Meng, Z., Zaykin, D. V., Xu, C. F., Wagner, M. & Ehm, M. G. Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. Am. J. Hum. Genet. 73, 115–130 (2003).

    CAS  Article  Google Scholar 

  24. 24

    Haiman, C. A. et al. A comprehensive haplotype analysis of CYP19 and breast cancer risk: the Multiethnic Cohort. Hum. Mol. Genet. 12, 2679–2692 (2003). One of the first studies to use a comprehensive haplotype-tagging approach to examine a gene for common variants associated with breast cancer risk.

    CAS  Article  Google Scholar 

  25. 25

    Carlson, C. S. et al. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74, 106–120 (2004). This paper reports the results of re-sequencing 100 genes in 24 African-American and 23 European-American samples. They showed that a tagging-SNP set can comprehensively interrogate for main effects of common variants, but that tagging SNPs should be selected separately for populations of different ancestries.

    CAS  Article  Google Scholar 

  26. 26

    Chapman, J. M., Cooper, J. D., Todd, J. A. & Clayton, D. G. Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum. Hered. 56, 18–31 (2003).

    Article  Google Scholar 

  27. 27

    Zhang, K. & Jin, L. HaploBlockFinder: haplotype block analyses. Bioinformatics 19, 1300–1301 (2003).

    CAS  Article  Google Scholar 

  28. 28

    Stram, D. O. et al. Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum. Hered. 55, 27–36 (2003).

    Article  Google Scholar 

  29. 29

    Ke, X. & Cardon, L. R. Efficient selective screening of haplotype tag SNPs. Bioinformatics 19, 287–288 (2003).

    CAS  Article  Google Scholar 

  30. 30

    Neale, B. M. & Sham, P. C. The future of association studies: gene-based analysis and replication. Am. J. Hum. Genet. 75, 353–362 (2004).

    CAS  Article  Google Scholar 

  31. 31

    Marron, M. P. et al. Insulin-dependent diabetes mellitus (IDDM) is associated with CTLA4 polymorphisms in multiple ethnic groups. Hum. Mol. Genet. 6, 1275–1282 (1997).

    CAS  Article  Google Scholar 

  32. 32

    Hugot, J. P. et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411, 599–603 (2001).

    CAS  Article  Google Scholar 

  33. 33

    Boyd, N. F. et al. Heritability of mammographic density, a risk factor for breast cancer. N. Engl. J. Med. 347, 886–894 (2002).

    Article  Google Scholar 

  34. 34

    Lakhani, S. R. et al. Multifactorial analysis of differences between sporadic breast cancers and cancers involving BRCA1 and BRCA2 mutations. J. Natl Cancer Inst. 90, 1138–1145 (1998).

    CAS  Article  Google Scholar 

  35. 35

    Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genet. 33, S228–S237 (2003).

    Article  Google Scholar 

  36. 36

    Antoniou, A. & Easton, D. F. Polygenic inheritance of breast cancer: implications for design of association studies. Genet. Epidemiol. 25, 190–203 (2003).

    Article  Google Scholar 

  37. 37

    Meijers-Heijboer, H. et al. Low-penetrance susceptibility to breast cancer due to CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations. Nature Genet. 31, 55–59 (2002).

    CAS  Article  Google Scholar 

  38. 38

    Dunning, A. M. et al. The extent of linkage disequilibrium in four populations with distinct demographic histories. Am. J. Hum. Genet. 67, 1544–1554 (2000).

    CAS  Article  Google Scholar 

  39. 39

    Sham, P., Bader, J. S., Craig, I., O'Donovan, M. & Owen, M. DNA Pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862–871 (2002).

    CAS  Article  Google Scholar 

  40. 40

    Barratt, B. J. et al. Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Ann. Hum. Genet. 66, 393–405 (2002).

    CAS  Article  Google Scholar 

  41. 41

    Risch, N. & Merikangas, K. The future of genetic studies of complex diseases. Science 273, 1516–1517 (1996).

    CAS  Article  Google Scholar 

  42. 42

    Carlson, C. S., Eberle, M. A., Kruglyak, L. & Nickerson, D. A. Mapping complex disease loci in whole-genome association studies. Nature 429, 446–452 (2004).

    CAS  Article  Google Scholar 

  43. 43

    Kruglyak, L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genet. 22, 139–144 (1999).

    CAS  Article  Google Scholar 

  44. 44

    Kuschel, B. et al. Common polymorphisms in CHEK2 (checkpoint kinase 2) are not associated with breast cancer risk. Cancer Epidemiol. Biomarkers Prev. 12, 809–812 (2003).

    CAS  PubMed  Google Scholar 

  45. 45

    Hastie, T., Tibshirani, R. & Friedman, J. The elements of statistical learning (Springer–Verlag, New York, 2001).

    Book  Google Scholar 

  46. 46

    Ritchie, M. D. et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69, 138–147 (2001).

    CAS  Article  Google Scholar 

  47. 47

    Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. & Rothman, N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst. 96, 434–442 (2004).

    Article  Google Scholar 

  48. 48

    Thomas, D. C. & Clayton, D. G. Betting odds and genetic associations. J. Natl Cancer Inst. 96, 421–423 (2004).

    Article  Google Scholar 

  49. 49

    Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    CAS  Article  Google Scholar 

  50. 50

    Pritchard, J. K. & Rosenberg, N. A. The use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).

    CAS  Article  Google Scholar 

  51. 51

    Cardon, L. R. & Palmer, L. J. Population stratification and spurious allelic association. Lancet 361, 598–604 (2003). An excellent review of methods to detect and account for population stratification in genotype–phenotype association studies.

    Article  Google Scholar 

  52. 52

    Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nature Genet. 36, 512–517 (2004).

    CAS  Article  Google Scholar 

  53. 53

    Freedman, M. L. et al. Assessing the impact of population stratification on genetic association studies. Nature Genet. 36, 388–393 (2004).

    CAS  Article  Google Scholar 

  54. 54

    Risch, N. The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches. Cancer Epidemiol. Biomarkers Prev. 10, 733–741 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank the referees and editors, whose comments on earlier drafts of this manuscript were very helpful.

Author information



Corresponding author

Correspondence to Bruce A. J. Ponder.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links


Entrez Gene













National Cancer Institute

breast cancer

colorectal cancer

gastric cancer


ovarian cancer


adenomatosis polyposis coli

multiple endocrine neoplasia type 2

type 1 diabetes


Fred Hutchinson Center Seattle SNPs Program

International HapMap Project

National Cancer Institute Consortium of Cohorts

National Institute of Environmental Health Sciences Environmental Genome Project SNPs Program



The frequency with which individuals who carry a given mutation show the manifestations associated with that mutation. If the penetrance of a disease allele is 100%, then all individuals carrying that allele will express the associated phenotype.


A statistical method in which the genotypes and phenotypes of parents and offspring in families are studied to determine whether two or more loci are assorting independently or exhibiting linkage during meiosis.


A non-hereditary alteration in phenotype, induced by environmental factors such as nutritional status, that mimics the phenotype produced by a specific gene.


A polymorphism is the existence of two or more variants (alleles, sequence variants, chromosomal structural variants) at significant frequencies in the population. It is conventional for a genetic variant with a frequency of >1% to be called a polymorphism.


The physical arrangement of multiple alleles along a chromosome or segment of a chromosome.


A tandem repeat is two or more copies of the same DNA sequence arranged in a direct head to tail succession along a chromosome. The number of copies of the repeat might vary in the population.


A polymorphic difference in DNA sequence between individuals that can be recognized by restriction endonucleases.


Any polymorphic variation at a single nucleotide (base) in the genome.


The relative risk of disease associated with a particular risk factor (also known as an exposure), such as a particular genotype, is the ratio of the incidence of disease in individuals with that risk factor to the incidence of disease in individuals without the risk factor.


The estimation of haploype frequencies in a population is complicated by the fact that haplotypes for diploid data are not usually directly observable. Haplotypes can be resolved (inferred) by using parental genotype data or estimated by using statistical estimation.


In a whole-genome linkage analysis, the strength of linkage at any given marker is given by the log of odds (LOD) score. A high LOD score at one or several adjacent markers can be called a linkage peak.


A congenic strain is derived by mating mice carrying a locus of interest in each succeeding generation to mice of an inbred strain. A fully congenic strain and the inbred partner are expected to be identical at all loci except for the transferred locus and a linked segment of chromosome.


The physical presence of two or more genetic loci on the same chromosome, whether or not they are close enough together to demonstrate linkage.


When a population expands from a limited number of individuals, those individuals are known as founders. The founder effect is when a particular allele is frequent in a population derived from a small number of founders.


Uses case-control data to pool multilocus genotypes into either a high-risk or a low-risk group, effectively reducing the number of genotype predictors to one. The new one-dimensional multilocus genotype can then be evaluated to classify and predict disease status.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Pharoah, P., Dunning, A., Ponder, B. et al. Association studies for finding cancer-susceptibility genetic variants. Nat Rev Cancer 4, 850–860 (2004).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing