figure a

Case-control studies provide estimates of how much more likely an outcome is amongst people who are subject to a particular exposure than amongst people who are not [1,2,3,4]. So they are helpful for answering questions about the aetiology of a disease or condition (i.e. an outcome).

Case-control studies are particularly useful for studying the cause of an outcome that is rare and for studying the effects of prolonged exposure. For example, a case-control study could be used to determine whether long-term use of indwelling catheters (the exposure) causes bladder cancer (the outcome) in people with spinal cord injury. (This is an example of a study of prolonged exposure on risk of a rare disease. Incidentally, the causal link between catheters and bladder cancer is contentious [5,6,7]).

In this example, the cases would be people with spinal cord injury, from the study base, who develop bladder cancer. It is important that all cases, or a random sample of all cases, from the study base are identified; they should not merely be a sample of convenience. (The study base might be, for example, all of the people with a spinal cord injury in a geographical area, or all of the people who, if they developed the disease of interest, would present at a particular hospital.) The controls should be sampled from the same study base of people with spinal cord injury. Controls must be sampled in a way that is not influenced by whether they are or are not exposed. So in our example, the controls would be a randomly selected group of people with spinal cord injury drawn from the same study base as the cases, irrespective of whether they do or do not have bladder cancer and irrespective of whether they have or have not been exposed to indwelling catheters. Theoretically, a person could be both a case and a control, although this is unlikely to happen because bladder cancer is rare. Data must be collected on exposures and outcomes of every participant. In the current example, data must be collected on the use of indwelling catheters and presence of bladder cancer. From these data, it is possible to construct an odds ratio that depicts how much more likely a person who has used indwelling catheters is to develop bladder cancer than a person who has not used indwelling catheters. Case-control studies are observational studies, so even if cases and controls are sampled without regard to exposure, it is still necessary to rigorously adjust for confounding.

Often, researchers conduct a different sort of study and erroneously call it a case-control study. In that design, researchers sample controls from a population that does not develop the disease of interest. For example, they sample people with spinal cord injury who do not develop bladder cancer. When controls are sampled in this way, the odds ratios may provide biased estimates of the causal effect, even if confounding is rigorously controlled.

Matching may improve the efficiency of case-control designs. However, a common misunderstanding is that matched case-control studies need only involve collecting data on a convenient sample of cases and a convenient sample of people who are matched to the cases on a few variables [8]. This is not correct. As far back as 1986 Rothman said that:

''..because [case control studies] need not be expensive nor time-consuming to conduct….many studies have been conducted by would-be investigators who lack even a rudimentary appreciation for epidemiologic principles. Occasionally such haphazard research can produce fruitful or even extremely important results, but often the results are wrong because basic research principles have been violated” (cited p. 431 [9]).

Importantly, a matched case-control design still requires that cases be people from the study base with the condition of interest and controls still need to be sampled from the same study base without regard to exposure. In a matched design, there is the additional complexity that cases and controls are matched on variables that are likely to confound estimates (e.g. time since injury or age). A well-conducted matched case-control design may be more efficient and therefore requires a smaller sample size than an unmatched study. However, matching on a variable that is not actually a confounder may reduce efficiency. Moreover, matching on a variable that is affected by the exposure (a mediator) or is affected by both the exposure and the outcome (a collider) may introduce bias. For example, while it might be tempting to match for smoking status because those who smoke are more likely to develop bladder cancer [5, 6], this would only be necessary if smoking status influences the likelihood of using indwelling catheters (which would seem unlikely). It is therefore important to consider whether a variable is a true confounder before matching on it [10].

Spinal Cord values carefully designed case-control studies because they provide a very efficient way of estimating the causal effect of an exposure on the risk of developing a rare condition. However, they need to be grounded in key epidemiological principles to ensure that the results are trustworthy.