Abstract 623 Poster Session I, Saturday, 5/1 (poster 200)

Databases (DB) derived from electronic capture of medical data are becoming increasingly prevalent and are convenient sources of epidemiologic & health service utilization data. Accuracy of diagnosis (Dx) data in such DB is often unknown. In addition to data entry errors, the data capture methods may lead to incomplete listings of or forced choices for Dx that reduce the precision with which the DB represents the clinical picture of the population under study. This study was done to assess impact of different measures on apparent accuracy of such a DB. METHODS: A DB of clinic visits with demographic and Dx data was developed from scannable encounter forms (EF) from a public pediatric clinic. The EF contained 105 ICD-9-based Dx options + write-in fields. One primary (P) and multiple secondary Dx could be marked. From 32,634 visits over 2.5 yrs, a random sample of 317 visits were reviewed for Dx listed in the medical record (MedRec). Two stringency levels (SL) for matching of individual Dx were defined: 1) exact wording or reasonable synonym (e.g., diarrhea ≈ gastroenteritis; 2) imprecise synonym match (e.g., sinusitis ≈ upper respiratory tract infection) or well child Dx with non-well Dx also listed. Six measures of matching were used, taking into account PDx, number of Dx, and direction of matching (MedRec Dx found in DB vs. DB Dx found in MeRec). RESULTS: 1) The PDx in the MedRec matched the PDx in the scanned DB in 84% of cases at the 1st SL and in 88% at the 2nd SL. 2) The PDx in the MedRec was matched by any Dx in the scanned DB in 88% of cases at the 1st SL and 92% at the 2nd. 3) The PDx in the scanned DB matched any Dx in the MedRec in 91% of cases at the 1st SL and 95% at the 2nd. 4) All Dx listed in the MedRec were matched in the scanned DB in 65% of cases at the 1st SL and 68% at the 2nd. 5) All Dx in the scanned DB were matched in the MedRec in 87% of cases at the 1st SL and 90% at the 2nd. 6) At least one Dx in the scanned DB was matched in the MedRec in 92% of cases at the 1st SL and 97% at the 2nd. CONCLUSIONS: The accuracy of diagnosis data captured in this clinical DB, relative to the MedRec, ranged from 65% to 97% depending on the measure of accuracy and the rigor of matching required. This illustrates the importance of validating diagnosis data from electronic sources and the need for selection of reasonable criteria for matching. Overly stringent requirements may lead to rejection of conclusions drawn from analyses of imperfect but reasonably accurate DB. Overly lax definitions may lead to acceptance of invalid conclusions from an insufficiently accurate DB. Use of two or more measures of varying stringency, such as the degree of matching of the PDx in the MedRec to any Dx in the DB with the 1st SL, plus the matching of all Dx in the DB to those in the MedRec with the 2nd SL, may give a reasonable portrayal of the accuracy of diagnosis data in clinical DB.