Table 1: Characteristics of the real datasets.

From: Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes

Dataset Time spana Number of patients with at least 1 diagnosis code Number of diagnosis code recorded at least once Average number of diagnoses per patient Average number of diagnoses per patient among silver standard matches
RA1 6 years 26,681 7,868 30.2 33.6
RA2 6 years 5,707 4,981 29.0 33.3
RA2 11 years 6,394 6,086 44.2 54.0
  1. aThe 6-year time span includes codes from 1/1/2002 through 12/31/2007, while the 11-year time span includes codes from 1/1/2002 through 12/31/2012.