Genes, environment and the value of prospective cohort studies


Case–control studies have many advantages for identifying disease-related genes, but are limited in their ability to detect gene–environment interactions. The prospective cohort design provides a valuable complement to case–control studies. Although it has disadvantages in duration and cost, it has important strengths in characterizing exposures and risk factors before disease onset, which reduces important biases that are common in case–control studies. This and other strengths of prospective cohort studies make them invaluable for understanding gene–environment interactions in complex human disease.

Figure 1: The importance of gene–environment interactions — an example.
Figure 2: The case–control and prospective cohort study designs.
Figure 3: Sample-size requirements in prospective cohort studies.


The authors express appreciation to M. Boehnke, E. Boerwinkle, B. Foxman, M. Khoury, L. Kuller, J. Ordovas and B. Psaty for their critical review and comments on this manuscript.

A putative cause or characteristic determinant of a health outcome of interest.

Risk factor

An attribute or exposure that increases the probability of disease or other outcome; used by some to mean causal factor or 'determinant' and by others to mean 'risk marker'.


Originally defined as a group of people born during a particular period (a 'birth cohort'); now broadened to include any designated group of people who are followed or traced over time.

Risk marker

An attribute or exposure that is associated with an increase in the probability of a specified outcome, but is not necessarily a causal factor.

Population stratification

The presence of different allele frequencies in cases and controls that is attributable to diversity in the background population and is unrelated to outcome status.

Ancestry informative (ancestral) marker

A locus with several polymorphisms that exhibit substantially different frequencies between ancestral populations. For example, the Duffy null allele has a frequency of almost 100% of sub-Saharan Africans, but occurs infrequently in other populations.


The number of new cases of disease that develop during a period of time.

Odds ratio (or relative odds)

The odds of disease in the individuals exposed to an environmental factor or genetic variant divided by the odds in unexposed individuals; or the odds of exposure in the cases divided by the odds in the controls (they are algebraically equivalent). If the odds ratio is significantly greater than one, then the environmental factor or genetic variant is associated with the disease.

Study power

The probability of rejecting the null hypothesis of no association in a study if it is in fact false, or of detecting a difference between two groups if it does in fact exists.

Type I error rate

The probability of rejecting the null hypothesis of no association in a study if it is in fact true, or of detecting a difference between two groups when no difference exists.

