Table 2 Remaining PHI analysis by tool, UCSF test corpus.

From: Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes

PHI category Instances of PHI remaining (PHIlter) Instances of PHI remaining (Physionet) Instances of PHI remaining (Scrubber)
Age ≥ 90 0 0 0
Patient_Vehicle_or_Device_Id 0 18 0
Patient_Account_Number 0 35 4
Patient_Medical_Record_Id 0 445 0
Patient_Social_Security_Number 0 0 6
Patient_Phone_Fax 0 0 1
Patient_Initials 2 120 132
Patient_Name_or_Family_Member_Name 6 211 93
Patient_Address 7 25 16
Patient_Unique_ID 20 442 34
Email 0 1 1
URL_IP 4 20 153
Date 7 257 269
Provider_Certificate_or_License 0 276 99
Provider_Name 12 546 90
Provider_Initials 12 236 217
Provider_Address_or_Location 43 1597 210
Provider_Phone_Fax 45 49 43
  1. PHI counts for PHIlter, Physionet and Scrubber performance on the UCSF corpus. Instances of PHI represent single tokens within the span of multiple or single-token items of PHI.