Like cohort studies, the purpose of case-control studies is to establish association between exposure to risk factors and disease. Unlike cohort studies, however, members of the population with the disease are selected into the study at the outset and risk factor information is collected retrospectively (figure 1). These are known as the cases. A second group of individuals who do not have the disease, the controls, is also included in the study (figure 2). The case-control study design is often used in the study of rare diseases or as a preliminary study where little is known about the association between the risk factor and disease of interest.

Figure 1
figure 1

Prospective and retrospective studies

Figure 2
figure 2

Diagrammatic representation of case-controlled study

Case control studies are prone to bias and confounding. In order to minimise bias, care must be taken in the selection of both cases and controls, in establishing definitions of disease, risk factors and in ensuring there are no confounding associations between detection of disease and risk factor exposure.

Choice of cases

Care must be taken when choosing cases for the study. In particular it is important to distinguish between stages or subtypes of disease and to define a measure of health status. For example, when studying oral cleanliness it is important to define what is meant by oral cleanliness in terms of cause, nature and degree. It is also important to establish whether interest is in incident cases, subjects entered into study on detection of disease, or prevalent cases, those who have been diagnosed as having the disease prior to the study. Views, behaviours and reports of exposure to risk factor amongst these two groups will tend to differ, as those who have been diagnosed previously are likely to be more informed about the disease and may have altered their behaviour and attitudes since diagnosis. Incident case design is preferred as it reduces recall bias and over-representation of cases with long standing disease

Choice of controls

Controls should come from the same population at risk of disease, should not have the disease and should be representative of the target population. Selecting controls often proves harder than cases and requires great care in the prevention of bias. A sampling frame of hospital patients is often used to select controls, however risk factors such as diet and smoking are commonly linked to many diseases. Selecting controls in this way might therefore over-estimate population exposure to such risk factors, resulting in an underestimation of association between disease and exposure. Using more than one control group helps to overcome this type of issue.

Multiple controls can be used for each case, giving the study greater power, particularly where the number of cases is small, due for example, to the disease being rare.

Exposure to risk factors and matching

Exposure measurements are reliant either on memory where cases and controls are interviewed retrospectively, and/or medical records. Exposure estimates are therefore vulnerable to recall bias; commonly those with the disease are more likely to remember exposure than those without, interview or measurement bias; where the interviewer interviews or reports findings systematically differently between cases and controls and confounding factors. Interview and measurement bias can be overcome by including blinding in the design so that they do not know who is a case and who is a control at the time of interview.

Confounding factors must be identified prior to the start of the study. Individual cases can be matched to controls where is it thought that other factors, aside from those risk factors of interest, might contribute to the development of disease and confound the causal association under investigation. Cases and controls are commonly matched by age and sex. The factor upon which cases and controls are matched cannot be studied as a risk factor. An alternative method of overcoming confounding is to collect relevant information on the confounding factor during the course of the study and adjust for this at the analysis stage.

Analysis of Data

The table used to analyse the data, Table 1, looks much like that used in a cohort study design (see Pub 4), however, the case-control study is retrospective so that instead of measuring relative risk of disease based on exposure, we measure the odds of exposure based on disease.

Table 1

The odds ratio is made up of two components,

  • the odds of exposure for cases; the number of cases exposed/ the number of cases unexposed given by Odds/cases = a/b

  • the odds of exposure for cases; the number of cases exposed/ the number of cases unexposed given by Odds/controls = c/d

The estimated Odds Ratio is then Odds = ad/bc

The Odds Ratio is interpreted as

  • OR<1 : Odds of exposure for cases are less than those for control. Exposure appears to reduce risk of disease.

  • OR=1 : Odds of exposure for cases are the same as those for control. Exposure does not appear to be a risk factor.

  • OR>1 : Odds of exposure for cases are more than those for control. Exposure appears to increase risk of disease.

A 95% confidence interval gives an indication of the confidence we have in the estimated Odds Ratio, eg. if the entire 95% confidence interval is above 1, it is concluded that exposure significantly increases the risk of disease at the 95% level.

Advantages of case-control studies

Case-control studies are quick and cheap and are particularly suited to the study of rare diseases as the diseased are selected at the outset of the study.

Disadvantages of case-control studies

The disadvantages of case-control studies include

  • difficulties in overcoming potential bias and confounding

  • the successful selection of both cases and controls who are representative of their respective populations is often difficult.

  • an inability to infer causality and no information on the chronology of disease and exposure.

  • Inefficient in studying risk factors which are rare

  • Studies are often not population based, therefore it is impossible to calculate incidence of disease.