Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genes, environment and the value of prospective cohort studies


Case–control studies have many advantages for identifying disease-related genes, but are limited in their ability to detect gene–environment interactions. The prospective cohort design provides a valuable complement to case–control studies. Although it has disadvantages in duration and cost, it has important strengths in characterizing exposures and risk factors before disease onset, which reduces important biases that are common in case–control studies. This and other strengths of prospective cohort studies make them invaluable for understanding gene–environment interactions in complex human disease.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: The importance of gene–environment interactions — an example.
Figure 2: The case–control and prospective cohort study designs.
Figure 3: Sample-size requirements in prospective cohort studies.


  1. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  2. Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  Google Scholar 

  3. International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  4. Chakravarti, A. & Little, P. Nature, nurture, and human disease. Nature 421, 412–414 (2003).

    Article  Google Scholar 

  5. Collins, F. S. The case for a US prospective cohort study of genes and environment. Nature 429, 475–477 (2004).

    Article  CAS  Google Scholar 

  6. Hunter, D. J. Gene–environment interactions in human diseases. Nature Rev. Genet. 6, 287–298 (2005).

    Article  CAS  Google Scholar 

  7. Ordovas, J. M. et al. Dietary fat intake determines the effect of a common polymorphism in the hepatic lipase gene promoter on high-density lipoprotein metabolism: evidence of a strong dose effect in this gene–nutrient interaction in the Framingham Study. Circulation 106, 2315–2321 (2002).

    Article  CAS  Google Scholar 

  8. Tai, E. S. et al. Singapore National Health Survey. Dietary fat interacts with the -514C>T polymorphism in the hepatic lipase gene promoter on plasma lipid profiles in a multiethnic Asian population: the 1998 Singapore National Health Survey. J. Nutr. 133, 3399–3408 (2003).

    Article  CAS  Google Scholar 

  9. Bos, G. et al. Interactions of dietary fat intake and the hepatic lipase −480C>T polymorphism in determining hepatic lipase activity: the Hoorn Study. Am. J. Clin. Nutr. 81, 911–915 (2005).

    Article  CAS  Google Scholar 

  10. Ko, Y. L., Hsu, L. A., Hsu, K. H., Ko, Y. H. & Lee, Y. S. The interactive effects of hepatic lipase gene promoter polymorphisms with sex and obesity on high-density-lipoprotein cholesterol levels in Taiwanese-Chinese. Atherosclerosis 172, 135–142 (2004).

    Article  CAS  Google Scholar 

  11. St-Pierre, J. et al. Visceral obesity attenuates the effect of the hepatic lipase −514C>T polymorphism on plasma HDL-cholesterol levels in French-Canadian men. Mol. Genet. Metab. 78, 31–36 (2003).

    Article  CAS  Google Scholar 

  12. Manolio, T. Novel risk markers and clinical practice. N. Engl. J. Med. 349, 1587–1589 (2003).

    Article  CAS  Google Scholar 

  13. Langholz, B., Rothman, N., Wacholder, S. & Thomas, D. C. Cohort studies for characterizing measured genes. J. Natl Cancer Inst. Monogr. 26, 39–42 (1999).

    Article  CAS  Google Scholar 

  14. Gordis, L. Epidemiology 2nd edn (W. B. Saunders, Philadelphia, 2000).

    Google Scholar 

  15. Foster, M. W. & Sharp, R. R. Will investments in large-scale prospective cohorts and biobanks limit our ability to discover weaker, less common genetic and environmental contributors to complex diseases? Environ. Health Perspect. 113, 119–122 (2005).

    Article  Google Scholar 

  16. Barbour, V. UK Biobank: a project in search of a protocol? Lancet 361, 1734–1738 (2003).

    Article  Google Scholar 

  17. Khoury, M. J. The case for a global human genome epidemiology initiative. Nature Genet. 36, 1027–1028 (2004).

    Article  CAS  Google Scholar 

  18. Clayton, D. & McKeigue, P. M. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet 358, 1356–1360 (2001).

    Article  CAS  Google Scholar 

  19. Sackett, D. L. Bias in analytic research. J. Chron. Dis. 32, 51–63 (1979).

    Article  CAS  Google Scholar 

  20. Schlesselman, J. J. Case–Control Studies: Design, Conduct, and Analysis (Oxford Univ. Press, New York, 1982).

    Google Scholar 

  21. Neyman, J. Statistics: servant of all sciences. Science 122, 401–406 (1955).

    Article  CAS  Google Scholar 

  22. Taube, A. Matching in retrospective studies, sampling via the dependent variable. Acta Soc. Med. Ups. 73, 187–196 (1968).

    CAS  PubMed  Google Scholar 

  23. Wang, S. S., Fridinger, F., Sheedy, K. M. & Khoury, M. J. Public attitudes regarding the donation and storage of blood specimens for genetic research. Community Genet. 4, 18–26 (2001).

    CAS  PubMed  Google Scholar 

  24. Bhatti, P. et al. Genetic variation and willingness to participate in epidemiologic research: data from three studies. Cancer Epidemiol. Biomarkers Prev. 14, 2449–2453 (2005).

    Article  Google Scholar 

  25. Austin, H., Hill, H. A., Flanders, W. D. & Greenberg, R. S. Limitations in the application of case–control methodology. Epidemiol. Rev. 16, 65–76 (1994).

    Article  CAS  Google Scholar 

  26. Miettinen, O. S. The “case–control” study: valid selection of subjects. J. Chronic Dis. 38, 543–548 (1985).

    Article  CAS  Google Scholar 

  27. Wacholder, S., Silverman, D. T., McLaughlin, J. K. & Mandel, J. S. Selection of controls in case–control studies. III. Design options. Am. J. Epidemiol. 135, 1042–1050 (1992).

    Article  CAS  Google Scholar 

  28. Doll, R. Proof of causality. Persp. Biol. Med. 45, 499–515 (2002).

    Article  Google Scholar 

  29. Rosenberg, N. A., Li, L. M., Ward, R. & Pritchard, J. K. Informativeness of genetic markers for inference of ancestry. Am. J. Hum. Genet. 73, 1402–1422 (2003).

    Article  CAS  Google Scholar 

  30. Helgason, A., Yngvadottir, B., Hrafnkelsson, B., Gulcher, J. & Stefansson, K. An Icelandic example of the impact of population structure on association studies. Nature Genet. 37, 90–95 (2005).

    Article  CAS  Google Scholar 

  31. Ben-Shlomo, Y., Smith, G. D., Shipley, M. & Marmot, M. G. Magnitude and causes of mortality differences between married and unmarried men. J. Epidemiol. Community Health 47, 200–205 (1993).

    Article  CAS  Google Scholar 

  32. Zeger, S. L., Liang, K. Y. & Albert, P. S. Models for longitudinal data: a generalized estimating equation approach. Biometrics 44, 1049–1060 (1998).

    Article  Google Scholar 

  33. Kolonel, L. N. et al. A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. Am. J. Epidemiol. 151, 346–357 (2000).

    Article  CAS  Google Scholar 

  34. The ARIC investigators. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. Am. J. Epidemiol. 129, 687–702 (1989).

  35. The Women's Health Initiative Study Group. Design of the Women's Health Initiative clinical trial and observational study. Control. Clin. Trials 19, 61–109 (1998).

  36. Colditz, G. A., Manson, J. E. & Hankinson S. E. The Nurses' Health Study: 20-year contribution to the understanding of health among women. J. Womens Health 6, 49–62 (1997).

    Article  CAS  Google Scholar 

  37. Newman, A. B. et al. Association of long-distance corridor walk performance with mortality, cardiovascular disease, mobility limitation, and disability. JAMA 295, 2018–2026 (2006).

    Article  CAS  Google Scholar 

  38. Lloyd-Jones, D. M., Larson, M. G., Beiser, A. & Levy, D. Lifetime risk of developing coronary heart disease. Lancet 353, 89–92 (1999).

    Article  CAS  Google Scholar 

  39. Troyer, D. A., Mubiru, J., Leach, R. J. & Naylor, S. L. Promise and challenge: markers of prostate cancer detection, diagnosis and prognosis. Dis. Markers 20, 117–128 (2004).

    Article  CAS  Google Scholar 

  40. Tsai, A. W. et al. Coagulation factors, inflammation markers, and venous thromboembolism: the longitudinal investigation of thromboembolism etiology (LITE). Am. J. Med. 113, 636–642 (2002).

    Article  Google Scholar 

  41. Leibowitz, H. M. et al. The Framingham Eye Study monograph: an ophthalmological and epidemiological study of cataract, glaucoma, diabetic retinopathy, macular degeneration, and visual acuity in a general population of 2631 adults, 1973–1975. Surv. Ophthalmol. 24 S335–S610 (1980).

    Article  Google Scholar 

  42. Ellenberg, J. H. & Nelson, K. B. Sample selection and the natural history of disease. Studies of febrile seizures. JAMA 243, 1337–1340 (1980).

    Article  CAS  Google Scholar 

  43. Kannel, W. B. Clinical misconceptions dispelled by epidemiological research. Circulation 92, 3350–3360 (1995).

    Article  CAS  Google Scholar 

  44. Aleksic, N. et al. Factor XIIIA Val34Leu polymorphism does not predict risk of coronary heart disease: the Atherosclerosis Risk in Communities (ARIC) Study. Arterioscler. Thromb. Vasc. Biol. 22, 348–352 (2002).

    Article  CAS  Google Scholar 

  45. Taubes, G. Epidemiology faces its limits. Science 269, 164–169 (1995).

    Article  CAS  Google Scholar 

  46. Jamrozik, K., Weller, D. P. & Heller, R. F. Biobank: who'd bank on it? Med. J. Aust. 182, 56–57 (2005).

    PubMed  Google Scholar 

  47. Kannel, W. B. The Framingham Study: its 50-year legacy and future promise. J. Atheroscler. Thromb. 6, 60–66 (2000).

    Article  CAS  Google Scholar 

  48. Stamler, J. Blood pressure and high blood pressure. Aspects of risk. Hypertension 18, I95–107 (1991).

    Article  Google Scholar 

  49. Riboli, E. & Kaaks, R. The EPIC Project: rationale and study design. European Prospective Investigation into Cancer and Nutrition. Int. J. Epidemiol. 26, S6–S14 (1997).

    Article  Google Scholar 

  50. Weis, B. K. et al. Personalized exposure assessment: promising approaches for human environmental health research. Environ. Health Perspect. 113, 840–848 (2005).

    Article  CAS  Google Scholar 

  51. Gauderman, W. J. Sample size requirements for matched case–control studies of gene–environment interaction. Stat. Med. 21, 35–50 (2002).

    Article  Google Scholar 

  52. Altshuler, D. et al. The common PPARγ Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nature Genet. 26, 76–80 (2000).

    Article  CAS  Google Scholar 

  53. Grant, S. F. et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nature Genet. 38, 320–323 (2006).

    Article  CAS  Google Scholar 

  54. Meslin, E. M., Thomson, E. J. & Boyer, J. T. The Ethical, Legal, and Social Implications Research Program at the National Human Genome Research Institute. Kennedy Inst. Ethics J. 7, 291–298 (1997).

    Article  Google Scholar 

  55. Prentice, R. L. On the design of synthetic case–control studies. Biometrics 42, 301–310 (1986).

    Article  CAS  Google Scholar 

  56. Mantel, N. Synthetic retrospective studies and related topics. Biometrics 29, 479–486 (1973).

    Article  CAS  Google Scholar 

  57. Marshall, E. Whose DNA is it, anyway? Science 278, 564–567 (1997).

    Article  CAS  Google Scholar 

  58. Triendl, R. Japan launches controversial Biobank project. Nature Med. 9, 982 (2003).

Download references


The authors express appreciation to M. Boehnke, E. Boerwinkle, B. Foxman, M. Khoury, L. Kuller, J. Ordovas and B. Psaty for their critical review and comments on this manuscript.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Teri A. Manolio.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links



Parkinson disease


Type 2 Diabetes


Biobank Japan

Ethics and Governance Framework of the UK Biobank

Incidence and Prevalence Database

National Health Infrastructure Initiative

National Institute of Environmental Health Sciences

NHGRI Ethical, Legal and Social Issues

NHGRI Expert Panel Recommendations for a populationbased cohort

NIH Genes and Environment Initiative

Responses to NHGRI Request for Information

SEER Cancer Statistics Review, 1975-2002

Swedish National Biobank

The TDR Incidence and Prevalence Database

UK Biobank

Women's Health Initiative



A putative cause or characteristic determinant of a health outcome of interest.

Risk factor

An attribute or exposure that increases the probability of disease or other outcome; used by some to mean causal factor or 'determinant' and by others to mean 'risk marker'.


Originally defined as a group of people born during a particular period (a 'birth cohort'); now broadened to include any designated group of people who are followed or traced over time.

Risk marker

An attribute or exposure that is associated with an increase in the probability of a specified outcome, but is not necessarily a causal factor.

Population stratification

The presence of different allele frequencies in cases and controls that is attributable to diversity in the background population and is unrelated to outcome status.

Ancestry informative (ancestral) marker

A locus with several polymorphisms that exhibit substantially different frequencies between ancestral populations. For example, the Duffy null allele has a frequency of almost 100% of sub-Saharan Africans, but occurs infrequently in other populations.


The number of new cases of disease that develop during a period of time.

Odds ratio (or relative odds)

The odds of disease in the individuals exposed to an environmental factor or genetic variant divided by the odds in unexposed individuals; or the odds of exposure in the cases divided by the odds in the controls (they are algebraically equivalent). If the odds ratio is significantly greater than one, then the environmental factor or genetic variant is associated with the disease.

Study power

The probability of rejecting the null hypothesis of no association in a study if it is in fact false, or of detecting a difference between two groups if it does in fact exists.

Type I error rate

The probability of rejecting the null hypothesis of no association in a study if it is in fact true, or of detecting a difference between two groups when no difference exists.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Manolio, T., Bailey-Wilson, J. & Collins, F. Genes, environment and the value of prospective cohort studies. Nat Rev Genet 7, 812–820 (2006).

Download citation

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing