Selection bias in rheumatic disease research

Journal name:
Nature Reviews Rheumatology
Year published:
Published online


The identification of modifiable risk factors for the development of rheumatic conditions and their sequelae is crucial for reducing the substantial worldwide burden of these diseases. However, the validity of such research can be threatened by sources of bias, including confounding, measurement and selection biases. In this Review, we discuss potentially major issues of selection bias—a type of bias frequently overshadowed by other bias and feasibility issues, despite being equally or more problematic—in key areas of rheumatic disease research. We present index event bias (a type of selection bias) as one of the potentially unifying reasons behind some unexpected findings, such as the 'risk factor paradox'—a phenomenon exemplified by the discrepant effects of certain risk factors on the development versus the progression of osteoarthritis (OA) or rheumatoid arthritis (RA). We also discuss potential selection biases owing to differential loss to follow-up in RA and OA research, as well as those due to the depletion of susceptibles (prevalent user bias) and immortal time bias. The lesson remains that selection bias can be ubiquitous and, therefore, has the potential to lead the field astray. Thus, we conclude with suggestions to help investigators avoid such issues and limit the impact on future rheumatology research.

At a glance


  1. A causal diagram illustration of index event bias, also known as collider stratification bias.
    Figure 1: A causal diagram illustration of index event bias, also known as collider stratification bias.

    A causal diagram consists of a set of relevant variables (for example, exposures, potential confounders, and outcomes) and arrows to indicate the flow of causation between those variables. When there are multiple independent causes for an effect (i.e. a common effect), conditioning on this common effect (i.e. selecting only scenarios in which the effect is observed) leads to a spurious association between those causes. A classic, simple example of a coin toss (cause) and a ringing bell (effect) can illustrate the logic behind this phenomenon. In this experiment involving two coins and a bell, the bell rings whenever either coin comes up heads on a toss of both coins. Thus, the bell ringing is a common effect of heads appearing on the toss of either coin. In causal diagrams, this is depicted as colliding causal arrows on a given common effect variable (which gives the name 'collider'). Obviously, heads appearing from one coin toss is independent of heads appearing from the other coin toss; thus, these two causes are mutually independent with a correlation coefficient between the two of 0. However, if we calculate the correlation from only the events when the bell rings (i.e. we condition on the common effect of the bell ringing), the appearances of heads on the two coins are no longer independent, resulting in a correlation coefficient of −0.5. This discrepancy occurs because if coin A came up tails, then that must mean that coin B came up heads (and vice versa), as we know that the bell rang. This simple experiment demonstrates that conditioning on a common effect induces a negative correlation between two causes or 'risk factors'. Conditioning is marked by a box around the variable name, and the spurious association is marked by a dotted line between variables, as per causal diagram convention.

  2. A causal diagram of a typical observational study showing the assessment of the effect of obesity on OA progression among patients with (incident) OA.
    Figure 2: A causal diagram of a typical observational study showing the assessment of the effect of obesity on OA progression among patients with (incident) OA.

    Conditioning on (or restricting to) those with OA incidence (i.e. conditioning on a common effect, as explained in Figure 1) results in obesity and the URFs becoming negatively associated, as indicated by a dotted line between obesity and URFs, even though these two factors were not associated before OA incidence. This artificially-generated negative confounding results in a biased association between obesity and OA progression (represented as obesity—UR0F right arrow OA progression), leading to effect estimation biased towards the null (see Figure 1 legend for details). Abbreviations: OA, osteoarthritis; URFs, unknown or unmeasured risk factors.

  3. A causal diagram of a typical observational study showing the assessment of the effect of smoking on RA progression (or CVD complications) among patients with RA.
    Figure 3: A causal diagram of a typical observational study showing the assessment of the effect of smoking on RA progression (or CVD complications) among patients with RA.

    Similarly to in Figures 1 and 2, we consider independent risk factors (specifically, smoking and URFs) that are associated with both RA and RA progression (or CVD). Note that URFs are not associated with smoking (as indicated by the absence of a line between the two factors) before individuals develop RA. Thus, URFs would not be a confounder in a study of smoking and RA progression (or CVD) in the general population. However, smoking and URFs are no longer independent (as indicated by a dotted line between them) following conditioning on a common effect (in this case, restriction of the study sample to patients with RA, as denoted by a box around RA). Consequently, a biased association occurs between smoking and RA progression (or CVD) (represented as smoking—URFs right arrow RA progression [or CVD]). As the study design leads this spurious association with URFs to operate as a negative confounder, the resulting effect measure becomes underestimated or reversed (that is, paradoxical) unless the study appropriately adjusts for URFs. Abbreviations: CVD, cardiovascular disease; RA, rheumatoid arthritis; URFs, unknown or unmeasured risk factors.

  4. Causal diagrams displaying the effect of smoking on CVD.
    Figure 4: Causal diagrams displaying the effect of smoking on CVD.

    a | A causal diagram displaying two causal pathways (direct and indirect) for the total effect of smoking on CVD complications in the general (unselected) population. The box around 'confounders' denotes adjustments. The total effect of smoking on the risk of CVD in this population is the net combined causal effect through both pathways. b | A causal diagram of the total causal effect of smoking on CVD complications among patients with RA (i.e. a restricted population). *Theoretically, smoking initiation after RA onset would be equivalent to smoking exposure in the general population in part a; however, in practice, this would be unusual after RA onset. Alternatively, the impact of smoking cessation can be evaluated in these studies. Abbreviations: CVD, cardiovascular disease; RA, rheumatoid arthritis.

  5. Differential loss to follow-up in studies of RA therapy.
    Figure 5: Differential loss to follow-up in studies of RA therapy.

    a,b | Two observational pharmaco-epidemiological studies46, 47 showed high and differential loss rates between groups. c | By contrast, much lower levels of loss to follow-up were observed in a randomized trial46 of a biologic agent in RA at similar time points. Despite effectively controlling for confounders in the observational studies,46, 47 such a high level of differential loss to follow-up threatens the embedded assumption that loss to follow-up is completely random (i.e. not associated with an outcome, or mediators of an outcome), leaving the study design open to potential selection bias. Abbreviation: RA, rheumatoid arthritis.

  6. Immortal time bias as a form of selection bias.
    Figure 6: Immortal time bias as a form of selection bias.

    Immortal time bias is introduced as a form of selection bias in cohort studies when a period of 'immortal time' is excluded from the analysis. This exclusion occurs because the start of follow-up for the group receiving treatment (a biologic DMARD in this example) is defined by the start of treatment and is, by design (or by practice pattern), later than that for the comparison group (receiving a conventional DMARD). a | A depiction of the comparison group's follow-up starting at the time of RA diagnosis. b | A depiction of the comparison group's follow-up starting sometime after RA diagnosis (matched on certain time factors other than RA duration), but before biologic DMARD use. In both cases, unless the excluded period of nonbiologic agent use before biologic agent use (i.e. the unexposed immortal time) is appropriately assigned to the nonbiologic group in a time-varying manner,64 the immortal time-induced selection bias could lead to a major survival advantage for biologic agent users. Abbreviation: RA, rheumatoid arthritis.


  1. Reginster, J. Y. The prevalence and burden of arthritis. Rheumatology (Oxford) 41 (Suppl. 1), 36 (2002).
  2. Symmons, D. P. & Gabriel, S. E. Epidemiology of CVD in rheumatic disease, with a focus on RA and SLE. Nat. Rev. Rheumatol. 7, 399408 (2011).
  3. Gabriel, S. E. Heart disease and rheumatoid arthritis: understanding the risks. Ann. Rheum. Dis. 69 (Suppl. 1), i61i64 (2010).
  4. Eder, L. et al. The association between smoking and the development of psoriatic arthritis among psoriasis patients. Ann. Rheum. Dis. 71, 219224 (2012).
  5. Zhang, Y. et al. Methodologic challenges in studying risk factors for progression of knee osteoarthritis. Arthritis Care Res. (Hoboken) 62, 15271532 (2010).
  6. Canto, J. G. et al. Number of coronary heart disease risk factors and mortality in patients with first myocardial infarction. JAMA 306, 21202127 (2011).
  7. Zhang, Y. & Jordan, J. M. Epidemiology of osteoarthritis. Rheum. Dis. Clin. North Am. 34, 515529 (2008).
  8. Felson, D. T. et al. Osteoarthritis: new insights. Part 1: the disease and its risk factors. Ann. Intern. Med. 133, 635646 (2000).
  9. Belo, J. N., Berger, M. Y., Reijman, M., Koes, B. W. & Bierma-Zeinstra, S. M. Prognostic factors of progression of osteoarthritis of the knee: a systematic review of observational studies. Arthritis Rheum. 57, 1326 (2007).
  10. Zhang, Y. et al. Bone mineral density and risk of incident and progressive radiographic knee osteoarthritis in women: the Framingham Study. J. Rheumatol 27, 10321037 (2000).
  11. Hart, D. J. et al. The relationship of bone density and fracture to incident and progressive radiographic osteoarthritis of the knee: the Chingford Study. Arthritis Rheum. 46, 9299 (2002).
  12. Lane, N. E. et al. Wnt signaling antagonists are potential prognostic biomarkers for the progression of radiographic hip osteoarthritis in elderly Caucasian women. Arthritis Rheum. 56, 33193325 (2007).
  13. McAlindon, T. E. et al. Do antioxidant micronutrients protect against the development and progression of knee osteoarthritis? Arthritis Rheum. 39, 648656 (1996).
  14. Vesperini, V. et al. Tobacco exposure reduces radiographic progression in early rheumatoid arthritis. Results from the ESPOIR cohort. Arthritis Care Res. (Hoboken) 65, 18991906 (2013).
  15. Harrison, B. J., Silman, A. J., Wiles, N. J., Scott, D. G. & Symmons, D. P. The association of cigarette smoking with disease outcome in patients with early inflammatory polyarthritis. Arthritis Rheum. 44, 323330 (2001).
  16. Finckh, A., Dehler, S., Costenbader, K. H. & Gabay, C. Cigarette smoking and radiographic progression in rheumatoid arthritis. Ann. Rheum. Dis. 66, 10661071 (2007).
  17. Gonzalez, A. et al. Do cardiovascular risk factors confer the same risk for cardiovascular outcomes in rheumatoid arthritis patients as in non-rheumatoid arthritis patients? Ann. Rheum. Dis. 67, 6469 (2008).
  18. Naranjo, A. et al. Cardiovascular disease in patients with rheumatoid arthritis: results from the QUEST-RA study. Arthritis Res. Ther. 10, R30 (2008).
  19. Manson, J. E. et al. A prospective study of obesity and risk of coronary heart disease in women. N. Engl. J. Med. 322, 882889 (1990).
  20. Escalante, A., Haas, R. W. & del Rincon, I. Paradoxical effect of body mass index on survival in rheumatoid arthritis: role of comorbidity and systemic inflammation. Arch. Intern. Med. 165, 16241629 (2005).
  21. Wilson, P. W. et al. Prediction of coronary heart disease using risk factor categories. Circulation 97, 18371847 (1998).
  22. Myasoedova, E. et al. Lipid paradox in rheumatoid arthritis: the impact of serum lipid measures and systemic inflammation on the risk of cardiovascular disease. Ann. Rheum. Dis. 70, 482487 (2011).
  23. Peters, M. J. et al. EULAR evidence-based recommendations for cardiovascular risk management in patients with rheumatoid arthritis and other forms of inflammatory arthritis. Ann. Rheum. Dis. 69, 325331 (2009).
  24. Solomon, D. H., Peters, M. J., Nurmohamed, M. T. & Dixon, W. Unresolved questions in rheumatology: motion for debate: the data support evidence-based management recommendations for cardiovascular disease in rheumatoid arthritis. Arthritis Rheum. 65, 16751683 (2013).
  25. Li, W., Han, J. & Qureshi, A. A. Smoking and risk of incident psoriatic arthritis in US women. Ann. Rheum. Dis. 71, 804808 (2011).
  26. Bowcock, A. M. & Cookson, W. O. The genetics of psoriasis, psoriatic arthritis and atopic dermatitis. Hum. Mol. Genet. 13 (Suppl. 1), R43R55 (2004).
  27. Duffin, K. C. et al. Genetics of psoriasis and psoriatic arthritis: update and future direction. J. Rheumatol 35, 14491453 (2008).
  28. Aune, E., Roislien, J., Mathisen, M., Thelle, D. S. & Otterstad, J. E. The “smoker's paradox” in patients with acute coronary syndrome: a systematic review. BMC Med. 9, 97 (2011).
  29. Romero-Corral, A. et al. Association of bodyweight with total mortality and with cardiovascular events in coronary artery disease: a systematic review of cohort studies. Lancet 368, 666678 (2006).
  30. Lavie, C. J., De Schutter, A., Patel, D., Artham, S. M. & Milani, R. V. Body composition and coronary heart disease mortality—an obesity or a lean paradox? Mayo Clin. Proc. 86, 857864 (2011).
  31. Dahabreh, I. J. & Kent, D. M. Index event bias as an explanation for the paradoxes of recurrence risk research. JAMA 305, 822823 (2011).
  32. Kent, D. M. & Thaler, D. E. Is patent foramen ovale a modifiable risk factor for stroke recurrence? Stroke 41, S26S30 (2010).
  33. Tyas, S. L. et al. Transitions to mild cognitive impairments, dementia, and death: findings from the Nun Study. Am. J. Epidemiol. 165, 12311238 (2007).
  34. Glymour, M. M. Invited commentary: when bad genes look good—APOE*E4, cognitive decline, and diagnostic thresholds. Am. J. Epidemiol. 165, 12391246; author reply 1247 (2007).
  35. Baglin, T. Unraveling the thrombophilia paradox: from hypercoagulability to the prothrombotic state. J. Thromb. Haemost. 8, 228233 (2010).
  36. Hernandez-Diaz, S., Schisterman, E. F. & Hernan, M. A. The birth weight “paradox” uncovered? Am. J. Epidemiol. 164, 11151120 (2006).
  37. Myers, J. et al. The obesity paradox and weight loss. Am. J. Med. 124, 924930 (2011).
  38. VanderWeele, T. J., Mumford, S. L. & Schisterman, E. F. Conditioning on intermediates in perinatal epidemiology. Epidemiology 23, 19 (2011).
  39. VanderWeele, T. J. & Robins, J. M. Directed acyclic graphs, sufficient causes, and the properties of conditioning on a common effect. Am. J. Epidemiol. 166, 10961104 (2007).
  40. Smits, L. J. et al. Index event bias—a numerical example. J. Clin. Epidemiol. 66, 192196 (2013).
  41. Westreich, D. & Greenland, S. The Table 2 fallacy: presenting and interpreting confounder and modifier coefficients. Am. J. Epidemiol. 177, 292298 (2013).
  42. Valeri, L. & Vanderweele, T. J. Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychol. Methods 18, 137150 (2013).
  43. Zhang, Y. et al. What effect is really being measured? An alternative explanation of paradoxical phenomenon in studies of osteoarthritis progression. Arthritis Care & Res (Hoboken)
  44. Felson, D. T. et al. Risk factors for incident radiographic knee osteoarthritis in the elderly: the Framingham Study. Arthritis Rheum. 40, 728733 (1997).
  45. Cooper, C. et al. Risk factors for the incidence and progression of radiographic knee osteoarthritis. Arthritis Rheum. 43, 9951000 (2000).
  46. Grijalva, C. G. et al. Initiation of tumor necrosis factor-alpha antagonists and the risk of hospitalization for infection in patients with autoimmune diseases. JAMA 306, 23312339 (2011).
  47. Solomon, D. H. et al. Association between disease-modifying antirheumatic drugs and diabetes risk in patients with rheumatoid arthritis and psoriasis. JAMA 305, 25252531 (2011).
  48. O'Dell, J. R. et al. Therapies for active rheumatoid arthritis after methotrexate failure. N. Engl. J. Med. 369, 307318 (2013).
  49. O'Dell, J. R. et al. Treatment of rheumatoid arthritis with methotrexate and hydroxychloroquine, methotrexate and sulfasalazine, or a combination of the three medications: results of a two-year, randomized, double-blind, placebo-controlled trial. Arthritis Rheum. 46, 11641170 (2002).
  50. O'Dell, J. R. et al. Treatment of rheumatoid arthritis with methotrexate alone, sulfasalazine and hydroxychloroquine, or a combination of all three medications. N. Engl. J. Med. 334, 12871291 (1996).
  51. Dixon, W. & Felson, D. T. Is anti-TNF therapy safer than previously thought? JAMA 306, 23802381 (2011).
  52. Hernan, M. A., Hernandez-Diaz, S. & Robins, J. M. Randomized trials analyzed as observational studies. Ann. Intern. Med.
  53. Bongartz, T. et al. Anti-TNF antibody therapy in rheumatoid arthritis and the risk of serious infections and malignancies: systematic review and meta-analysis of rare harmful effects in randomized controlled trials. JAMA 295, 22752285 (2006).
  54. Little, R. J. et al. The prevention and treatment of missing data in clinical trials. N. Engl. J. Med. 367, 13551360 (2012).
  55. Doll, R. & Hill, A. B. Mortality of British doctors in relation to smoking: observations on coronary thrombosis. Natl Cancer Inst. Monogr. 19, 205268 (1966).
  56. Wolfe, F. & Michaud, K. Effect of body mass index on mortality and clinical status in rheumatoid arthritis. Arthritis Care Res. (Hoboken) 64, 14711479 (2012).
  57. Nguyen, U. S., Niu, J., Choi, H. K. & Zhang, Y. Body mass index and mortality: comment on article by Wolfe and Michaud. Arthritis Care Res. (Hoboken) 65, 834835 (2013).
  58. Choi, H. K. et al. The risk of pulmonary embolism and deep vein thrombosis in rheumatoid arthritis: a UK population-based outpatient cohort study. Ann. Rheum. Dis. 72, 11821187 (2013).
  59. Zoller, B., Li, X., Sundquist, J. & Sundquist, K. Risk of pulmonary embolism in patients with autoimmune disorders: a nationwide follow-up study from Sweden. Lancet 379, 244249 (2012).
  60. Grodstein, F. & Stampfer, M. The epidemiology of coronary heart disease and estrogen replacement in postmenopausal women. Prog. Cardiovasc. Dis. 38, 199210 (1995).
  61. Grady, D. et al. Hormone therapy to prevent disease and prolong life in postmenopausal women. Ann. Intern. Med. 117, 10161037 (1992).
  62. Manson, J. E. et al. Estrogen plus progestin and the risk of coronary heart disease. N. Engl. J. Med. 349, 523534 (2003).
  63. Hernan, M. A. et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology 19, 766779 (2008).
  64. Levesque, L. E., Hanley, J. A., Kezouh, A. & Suissa, S. Problem of immortal time bias in cohort studies: example using statins for preventing progression of diabetes. BMJ 340, b5087 (2010).
  65. Tsai, C. L. & Camargo, C. A. Jr. Methodological considerations, such as directed acyclic graphs, for studying “acute on chronic” disease epidemiology: chronic obstructive pulmonary disease example. J. Clin. Epidemiol. 62, 982990 (2009).
  66. Rich, J. D. et al. Prior aspirin use and outcomes in acute coronary syndromes. J. Am. Coll. Cardiol. 56, 13761385 (2010).

Download references

Author information


  1. Section of Rheumatology and Clinical Epidemiology Research and Training Unit, Boston University School of Medicine, 650 Albany Street, Suite 200, Boston, MA 02118, USA.

    • Hyon K. Choi
  2. Clinical Epidemiology Research and Training Unit, Boston University School of Medicine, 650 Albany Street, Suite 200, Boston, MA 02118, USA.

    • Uyen-Sa Nguyen,
    • Jingbo Niu &
    • Yuqing Zhang
  3. Department of Global Health and Population, Harvard School of Public Health, 665 Huntington Avenue, Building 1, Room 1107, Boston, MA 02115, USA.

    • Goodarz Danaei


All authors contributed equally to researching the data for the article, discussions of the content, writing the article and editing of the manuscript before submission.

Competing interests statement

The authors declare no competing interests.

Corresponding author

Correspondence to:

Author details

  • Hyon K. Choi

    Hyon K. Choi received his Master's and Doctorate degrees in epidemiology from Harvard University, Cambridge, MA, USA, and he completed his rheumatology fellowship training at Harvard Medical School and Massachusetts General Hospital, Boston, MA, USA, where he served as the Director of Outcomes Research in the Rheumatology Unit. He is now a Professor of Medicine and Public Health at Boston University, and a Research Scientist at Brigham and Women's Hospital, Boston, MA, USA, and at the Arthritis Research Centre of Canada. His main research interest lies in inflammatory arthritis conditions (namely rheumatoid arthritis, gout, and psoriatic arthritis), including risk factors for and consequences of these diseases, as well as the application of advanced epidemiological methods to musculoskeletal diseases.

  • Uyen-Sa Nguyen

    Uyen-Sa Nguyen received her Doctor of Science degree in epidemiology from Boston University, MA, USA, and completed a postdoctoral fellowship in the Musculoskeletal Research Centre at the Hebrew SeniorLife Institute for Aging Research, an affiliate of Harvard Medical School. She is now an Assistant Professor in the Department of Orthopedics and Physical Rehabilitation at the University of Massachusetts Medical School. Her research interests include the study of risk factor paradoxes in rheumatic disease research, as well as risk factors and consequences of musculoskeletal disorders.

  • Jingbo Niu

    Jingbo Niu earned a Doctor of Medicine degree from the Peking Union Medical College, China, and graduated from Boston University with a Doctor of Science degree in epidemiology. She is now a Research Associate Professor at the Boston University School of Medicine and School of Public Health. Her main research interest is the application of advanced epidemiological and statistical methods in studying osteoarthritis, arthritis-related pain, and other musculoskeletal conditions.

  • Goodarz Danaei

    Goodarz Danaei is an Assistant Professor of Global Health in both the Department of Global Health and Population and the Department of Epidemiology at the Harvard School of Public Health. His global health research focuses on estimating the effect of risk factors and preventive interventions on noncommunicable disease incidence and mortality at the population level, and his epidemiological research applies advanced methods of causal inference to questions of comparative effectiveness research from observational data in the context of cardiovascular diseases and other noncommunicable diseases.

  • Yuqing Zhang

    Yuqing Zhang received a Doctor of Medicine degree from Wuhan Medical College, China. He received his Master's in public health from Sydney University, Australia, and his Doctor of Science in epidemiology from Boston University. Yuqing Zhang is now a Professor of Medicine and Public Health at Boston University. His research interests include applying advanced epidemiological and statistical methods in musculoskeletal disease research, studying risk factors for the occurrence and progression of knee osteoarthritis, assessing triggers for recurrent gout attacks, and conducting pharmaco-epidemiological research using electronic databases.

Supplementary information

Word documents

  1. Supplementary information (92 KB)

    Let us consider an initial cohort of 30,000 participants without rheumatoid arthritis (RA) at baseline and assume that there are four risk factors involved in the aetiology of RA incidence (E1) or progression (E2), namely the risk factor of interest (R) and three other unmeasured risk factors (U1, U2, and U3).

Additional data